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Editor’s Note 


The first two articles in this issue of Tandem 
Systems Review reflect topics of high priority for 
business applications: high performance, fault 
tolerance, and high availability. “The NonStop 
Himalaya K10000 Interprocessor Bus” describes 
enhancements made to the high-speed interpro- 
cessor bus subsystem of the newest and most 
powerful Tandem system, the NonStop 
Himalaya K10000 server. The enhancements 
support the K10000 RISC processor’s increased 
performance capabilities. The latter part of the 
article presents a detailed technical discussion 
of the K10000 IPB architecture. 

Next, “Client/Server Availability” is a study 
of availability in today’s complex client/server 
environment. This two-part article begins by 
presenting a predictive model for evaluating 
client/server availability and applies it to a repre- 
sentative client/server environment. The second 
part of the article discusses a number of ways to 
increase client/server availability. 

The third article, “Automating Call Centers 
With CAM,” describes the Tandem Call Applica- 
tions Manager, software that links applications 
on Tandem NonStop systems with telephone 
switches to provide computer-telephone integra- 
tion. The article discusses basic functions and 
benefits of CAM, how CAM works in an auto- 
mated call center, and architectural considera- 
tions for developers who are building call center 
applications with CAM. 

Finally, we encourage you to fill out and return 
the Reader Survey found at the end of this issue. 
We are very interested in responding to your 
technical information needs. 
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Integrity Systems 


Integrity FT CM-1475 and 
CO-1475 Systems 
February 1994 


The new CM-1475 and CO-1475 sys- 
tems, based on the MIPS® R4400™, 
150MHz processor, are high-end 
Integrity FT systems for fault-tolerant 
computing. These systems provide 
more than 30 percent improvement in 
price/performance over comparable 
configurations of the older CM-1450 
and CO-1450 systems. 

The base configurations of both 
1475 models include a MIPS R4400, 
150MHz processor, 64 megabytes of 
local memory, and 16 megabytes of 
global memory. Maximum local mem- 
ory on the 1475 systems is 128 mega- 
bytes and maximum total memory 
(local plus global) is 192 megabytes. 
These base configurations do not 
include disks, tapes, or SCSI device 
controllers. The new packaging 
scheme allows users flexibility in 
choosing media and system options. 

The CO-1475 model incorporates 
specialized features to support 
telecommunications central office 
applications. These features include 
compliance with the stringent safety, 
fire resistance, earthquake resistance, 
temperature, power, and grounding 
standards required in telco central 
offices. 

The CM-1475 and CO-1475 models 
use the same system cabinets and 
mass storage cabinets as previous CM 


and CO systems. The system cabinet 
can store up to 9 disk or tape devices. 
In addition, these systems can support 
two mass storage cabinets with up to 
21 devices per cabinet. 


Integrity FT CM-1455 and 
CO-1455 Systems 
February 1994 


The new CM-1455 and CO-1455 sys- 
tems, based on the MIPS R4000®, 
75MHz processor, are midrange 
Integrity FT systems for fault-tolerant 
computing. These systems provide 
improved price/performance over 
comparable configurations of the older 
CM-1450 and CO-1450 systems. 

The base configurations of both 
1455 models include a MIPS R4000, 
75MHz processor, 64 megabytes of 
local memory, and 16 megabytes of 
global memory. Maximum local mem- 
ory on the 1455 systems is 128 mega- 
bytes and maximum total memory 
(local plus global) is 192 megabytes. 
These base configurations do not in- 
clude disks, tapes, or SCSI device con- 
trollers. The new packaging scheme 
allows users flexibility in choosing 
media and system options. 

The CO-1455 model incorporates 
specialized features to support 
telecommunications central office 
applications. These features include 
compliance with the stringent safety, 
fire resistance, earthquake resistance, 
temperature, power, and grounding 
standards required in telco central 
offices. 

The CM-1455 and CO-1455 models 
use the same system cabinets and 
mass storage cabinets as previous CM 
and CO systems. The system cabinet 
can store up to 9 disk or tape devices. 
In addition, these systems can support 
two mass storage cabinets with up to 
21 devices per cabinet. 


The Product Update department provides brief descriptions of new products announced by Tandem. 
For more information on any of these products, please consult your local Tandem representative. 
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Integrity FT CM-1300E System 
February 1994 


The new CM-1300E system, based on 
the MIPS R3000®, 25MHz processor, 
is an entry-level Integrity FT system. 
It offers users the benefits of the 
Integrity FT architecture and the 
flexibility for a cost-effective future 
upgrade to a CM-1455 or CM-1475 
system. 

The base configuration of the 
CM-1300E system includes a MIPS 
R3000, 25MHz processor, 16 or 
32 megabytes of local memory, and 
16 or 32 megabytes of global memory. 
Maximum total memory (local plus 
global) is 64 megabytes. The base con- 
figuration does not include disks, tapes, 
or SCSI device controllers. This pack- 
aging scheme allows users flexibility 
in choosing media and system options. 


Integrity NR/4401 Server 
January 1994 


The Integrity NR/4401 is a new low- 
end network resource server that 
offers a cost-effective upgrade path 
from the existing NR/4001 server. 
The NR/4401 provides a 28-percent 
price/performance improvement and 
a 56-percent absolute performance 
improvement over the NR/4001. 

The new server uses a MIPS R4400 
RISC processor running at |50MHz. It 
has a memory capacity of 384 mega- 
bytes and an internal disk capacity of 
3 gigabytes. Other standard features 
are the same as for the NR/4001 
server. 


Client/Server 


Computing Products 


Tandem Workflow Image 
System 3.1 
November 1993 


Tandem Workflow Image System 3.1 
is an enhanced version of the previ- 
ously announced Tandem Workflow 
Management product. It incorporates 
all of the basic features of the earlier 
version and includes a number of 
enhancements such as ViewStar ver- 
sion 3.1 software, an expanded com- 
patible-hardware base, and the avail- 
ability of related Tandem Education 
courses and Professional Services. 

Version 3.1 of ViewStar software 
offers client/server tools and capabili- 
ties that improve the reliability and 
performance of complex, distributed, 
high-volume business applications. It 
also makes the development and 
deployment of workflow applications 
faster and easier. The compatible- 
hardware base of Tandem Workflow 
Image System 3.1 has been extended 
to include a number of powerful 
database servers, such as the Tandem 
NonStop Himalaya servers and 
Tandem Integrity NR servers, as well 
as additional workstations and periph- 
eral devices. 

The Tandem training courses and 
Professional Services related to this 
product are aimed at helping users 
plan, design, and implement appropri- 
ate, cost-effective document manage- 
ment and workflow automation 
systems using the ViewStar software 
and Tandem products. 


Windows NT 
January 1994 


Tandem now offers Windows NT on 
its PSX and NDX platforms. Windows 
NT is a full 32-bit, preemptive multi- 
tasking operating system. Its fast, 
32-bit drivers directly manipulate disk 
hardware, which results in better sys- 
tem performance and more consistent 
throughput. This performance gain is 
especially apparent in client/server 

or other environments where heavy 
client processing loads are the norm. 
Windows NT also protects users’ cur- 
rent software investments by provid- 
ing backward compatibility for appli- 
cations running on MS-DOS and 
Windows operating systems. 


Windows NT Advanced Server 
January 1994 


Windows NT Advanced Server is a 
networking version of Windows NT 
for users implementing client/server 
applications such as database servers, 
messaging servers, or communica- 
tions gateways on different networks. 
This networking version builds on 
the Windows NT operating system, 
adding centralized network manage- 
ment and security functions, as well 
as connectivity to remote clients. The 
Windows NT Advanced Server is a 
complete file and print server, and 
excels as a platform for building 

the server portion of client/server 
applications. 
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Networking Products 


NonStop Access for Networking 
December 1993 


NonStop Access for Networking is a 
collection of network components that 
provides the automatic switching of 
Ethernet paths from a primary LAN to 
a secondary LAN in 10Base-T networks. 
The components are a 10Base-T version 
of the 3615 Ethernet controller and a 
Gemini dual-path network interface 
card (NIC). 

To eliminate a single point of fail- 
ure for the LAN attachment to Tandem 
servers, a minimum of two 3615 con- 
trollers are required. Tandem LAN 
Access Method (TLAM) software 
manages and controls both 3615 con- 
trollers. The Gemini NIC provides 
dual paths from each LAN-attached 
PC to the dual networking hubs. The 
Gemini NDIS software driver senses 
both paths and automatically switches 
from the active path to the backup 
path in the event of a failure. 

These components work in conjunc- 
tion with existing Ungermann-Bass 
10Base-T local area networking prod- 
ucts. To take advantage of NonStop 
Access for Networking, two 
Ungermann-Bass networking hubs 
are redundantly internetworked to 
provide primary and backup 10Base-T 
Ethernet paths to the 3615 controllers. 
Standard Ungermann-Bass dual 
Ethernet backbone LANs provide 
backup paths between the hubs. 


Access/Stax Networking Hub 
October 1993 


Access/Stax Networking Hub is a 
stackable, segmentable networking 
hub that provides for a port density 
greater than that of any other stackable 
solution on the market today. Up to 


five Access/Stax hubs can be stacked 
to accommodate departmental growth 
to a maximum of 120 ports. In addi- 
tion, the new 10Base-T hub supports 
up to 3 10OMbps Ethernet segments in 
a single 24-port hub, or up to 15 seg- 
ments in a stack of five hubs. These 
hubs can be either unmanaged or man- 
aged through consoles supporting 
Simple Network Management Protocol 
(SNMP) or Hub Management Interface 
(HMI). 


New Token-Ring Adapter 
October 1993 


The new token-ring adapter now 
available from Tandem provides full 
4Mbps or 16Mbps support for IBM 
and non-IBM token-ring networks. 
The new adapter is fully software con- 
figurable for ease of use and trouble- 
free installation. It features a complete 
software diagnostic package and is 
designed to automatically select either 
unshielded twisted pair (UTP) or 
shielded twisted pair (STP) wiring 

to provide full flexibility in network 
configuration. 


Workstation and : 


Terminal Products 


Indigo2 XL Workstation 
January 1994 


The Indigo2™ XL is a high-end work- 
station with new desktop packaging, 
faster and more fully-featured graph- 
ics, and a 30-percent price/perfor- 
mance improvement over the Indigo® 
R4000 workstations. Two Indigo? XL 
workstations are available: one based 
on the MIPS R4000 RISC processor 
running at 1OOMHz and one based 

on the R4400 RISC processor run- 
ning at 150MHz. Both workstations 
feature 24-bit color and high resolu- 
tion 19-inch monitors and offer a 
graphics level that is optimized for 


advanced program development, com- 
mercial client/server applications using 
the X-Windows system, and the dis- 
play of networks, traffic patterns, or 
other scenarios that require monitoring. 


PSX SP P/60 Desktop Computer 
December 1993 


The new PSX SP P/60 desktop com- 
puter, housed in a low-profile chassis, 
combines a Pentium processor with 
16 kilobytes integrated memory cache 
and 256 kilobytes of second-level 
cache to provide a great increase in 
processing speed. The PSX SP P/60 
uses an integrated PCI (Peripheral 
Component Interconnect) bus, the lat- 
est standard in local-bus technology. 
The computer’s two PCI slots run 
peripheral devices at accelerated 
speeds. In addition, with the video 
subsystem on the PCI local bus, there’s 
a direct express route between CPU 
and graphics, resulting in faster video 
performance. 

The PSX SP P/60 provides five 
expansion slots (two PCI local bus, 
three ISA), and four drive bays (three 
external, one internal). The two PCI 
slots are backward-compatible with 
ISA, making it possible for users to 
continue to take advantage of the wide 
variety of existing ISA-compatible 
products. 


Enhanced PSX EP-Series 
Desktop Computers 
October 1993 


Tandem now offers the next genera- 
tion of PSX EP-series desktop com- 
puters. The new computers provide all 
of the standard EP-series features such 
as memory expansion to 64 megabytes, 
512-kilobyte graphics memory (ex- 
pandable to | megabyte), and four 
16-bit ISA bus expansion slots. In 
addition, the new EP-series offers 
these enhancements: accelerated 
16-bit local bus graphics, three exter- 
nal drive bays, and upgradability to an 
Intel Pentium OverDrive processor. 
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PSX LP-Series Desktop 
Computers 
October 1993 


The new PSX LP-series computers 
offer high performance and cost effec- 
tiveness as well as a number of sophis- 
ticated security features. These comput- 
ers are housed in a low-profile desktop 
chassis and are Tandem’s first “green 
PC” offering, exceeding the guide- 
lines established by the EPA’s Energy 
Star power management program. 
The PSX LP-series computers 
feature 32-bit local bus graphics, 
512 kilobytes of standard graphics 
memory, and Pentium OverDrive 
and cache upgradability. They also 
provide such security features as pro- 
tected FlashBIOS, asset management, 
write protection, dual passwords, and 
AST Walk-n-Lock. 


NDX ST P/60 Workstation 
October 1993 


The NDX ST P/60 workstation, based 
on the AST CUPID architecture, is a 
new computer designed to provide the 
power needed for advanced network- 
ing. The ST P/60 uses the same tower 
chassis as the current NDX ST systems 
and can be used with the existing 
NDX ST memory and peripherals. 
With the 64-bit data paths of its 
Pentium 60MHz processor, the ST 
P/60 eliminates the memory bottle- 
neck typical of 32-bit systems. 

The ST P/60 workstation incorpo- 
rates eight EISA expansion slots, five 
drive bays (three external and two 
internal), up to 128 megabytes of RAM 
(expandable on the system board), and 
256 kilobytes of second-level cache 
memory (standard). The tower chassis 
provides slide-out, cable-free SCSI 
backplanes that make it easy to add 
enough hard-drive storage for even 
the largest network. To protect system 
components and data, the ST P/60 
offers a number of security features 
such as AST Walk-n-Lock, protected 
FlashBIOS, write protection, and asset 
management. 


New NDX ST Servers 
December 1993 


Tandem now offers two new models 
of NDX ST servers. Driven by Intel 
486DX or 486DX2 processors, the new 
high-performance NDX file servers 
excel where power and reliability are 
critical. Whereas most present systems 
provide a Pentium upgrade path only 
through a P24T socket for a future 
overdrive processor, the new NDX ST 
servers support both the future over- 
drive chip and a real Pentium proces- 
sor card with up to 512 kilobytes of 
second-level cache. The Pentium card 
slot on the system board can also 
accept a cache upgrade. 

Both new NDX models incorporate 
11 drive bays and eight EISA slots. 
The 8 slide-out, cable-free SCSI drives 
make it easy to add extra hard drives 
quickly, or to perform a quick hot-swap 
with little or no downtime should a 
hard drive fail. BIOS upgrades are 
made simple with AST FlashBIOS. 
The new servers also feature up to 
128 megabytes of RAM, second-level 
cache memory upgradable to 512 kilo- 
bytes (256 kilobytes is standard on the 
486DX2 model), and local-bus graph- 
ics with | megabyte of video memory 
(expandable to 2 megabytes). Hardware 
and software security is provided 
through a chassis lock, floppy and I/O 
write protection, asset management, 
dual-level passwords, and AST 
Walk-n-Lock. 


MS-DOS 6.2 
January 1994 


All new shipments of Tandem personal 
computers now include MS-DOS 6.2. 
This latest release of MS-DOS makes 
it safer and easier to double one’s disk 
space through the DoubleSpace com- 
pression feature of MS-DOS 6.0. 
Additional performance enhancements 
include faster CD-ROM access through 
an improved SmartDrive and new 
data-protection technology. 


Programming 


Languages 


C++ Programming Language 
December 1993 


Tandem’s C++ programming lan- 
guage environment is available for 
systems using the Tandem NonStop 
Kernel operating system, version D20 
or later. The Tandem C++ develop- 
ment environment has four compo- 
nents: C++ Translator based upon 
USL Cfront; ISO C compiler; Inspect 
debugger; and Tools.h++ foundation 
class library by Rogue Wave Software. 
Cfront is the de facto reference 
implementation of C++. The C++ 
Translator converts C++ source code 
to C source code, which is then com- 
piled by the C compiler. The C com- 
piler is a prerequisite for C++ program- 
ming for Tandem NonStop systems. 
Tandem’s Inspect symbolic debugger 
features direct debugging of C++ 
source code. Inspect is supplied 
on all Tandem NonStop systems. 
Tools.h++ is an industry-standard 
C++ data structure library that pro- 
vides fundamental structures for 
C++ programming. 
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The NonStop Himalaya K10000 interprocessor Bus 


a 
he Tandem™ NonStop™ 
Himalaya™ K10000 server 
delivers high performance, 
data integrity, and fault toler- 
ance for large, business-criti- 
cal applications. Tandem’s 
ss snewest and most powerful 
computer system, the K10000 server is designed 
to support a mixture of online transaction pro- 
cessing (OLTP), decision support, and batch 
applications. 

The K10000 servers have much in common 
with the Tandem NonStop systems that pre- 
ceded them. However, to achieve the highest 
levels of performance, K10000 servers differ 
from previous NonStop systems in several 
significant ways (Kong, 1994). In addition to 
enhancing the processor itself, designers made 
several significant changes affecting the inter- 
processor buses (IPBs), the high-speed buses 
used by the multiple processors in a NonStop 
system to communicate with one another. 
Specifically, designers improved the perfor- 
mance of the IPBs to support the increased 
capabilities of the K10000 processor. 


This article introduces the changes made in 
the Himalaya K10000 IPB subsystem. It begins 
by describing the purpose and functions of 
these enhancements. It then briefly discusses 
K10000 performance results relative to previous 
NonStop systems, which indicate that the IPB 
enhancements achieved their performance goals. 
The remainder of the article describes the archi- 
tecture of the K10000 IPB, focusing on the recent 
changes; these sections are intended for readers 
who want a more complete technical under- 
standing of the IPB. 


NonStop System Architecture 


Tandem NonStop systems are multiprocessor 
systems (sometimes called multicomputer sys- 
tems). To provide fault tolerance, the processors 
in a NonStop system do not share memory, as 
many other multiprocessor systems do. Instead, 
each processor is a completely separate, fully 
functional computer with its own main mem- 
ory, caches, I/O channels, and other compo- 
nents. The processors communicate with one 
another over a pair of high-performance buses, 
collectively called the Dynabus”™, which allows 
the processors to cooperate and coordinate their 
work so they can function as a single-server 
system. Figure | illustrates the NonStop system 
architecture. 
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Figure 1 
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As shown in Figure 1, the IPBs connecting 
the multiple processors play a central role in a 
NonStop system. The buses must be highly 
reliable, because all communication between 
processors depends on them. They must also be 
fault tolerant; therefore, the Dynabus comprises 
two IPBs. During normal operation, consistent 
with the Tandem philosophy of avoiding unused 
standby equipment, the software uses both IPBs 
for interprocessor communication. However, if 
a failure occurs, the software automatically 
routes all traffic over the remaining bus. 

The IPBs must also provide both high band- 
width (aggregate transfer rate) and low latency 
(elapsed time for a transfer) for interprocessor 
communications. Because the Dynabus is one 
of the few shared resources in a NonStop sys- 
tem, it could become a bottleneck to system 
performance and scalability if it were not care- 
fully designed for high performance. 


Over the years, these performance considera- 
tions have led to several incremental changes in 
the design of the IPBs and the processor’s inter- 
face to them. In the NonStop VLX™ systems 
(introduced in 1986), the bus protocol was altered 
and the bus clock speed increased relative to the 
NonStop TXP™ and earlier systems. The 
NonStop VLX and NonStop Cyclone™ systems 
increased the sizes of the packet queues that 
serve as buffers between the processors and the 
buses. In this tradition, the IPB subsystem was 
further improved to support the increased per- 
formance of the K10000 processor. 


eee 
Figure 1. 

NonStop system 

architecture. 
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Supporting the Performance of the 
K10000 Processor 


At the core of a K10000 server is the powerful 
MIPS® R4400™ (RISC) microprocessor, running 
at an internal clock rate of 150 MHz. The micro- 
processor contains 32 kilobytes of on-chip caches 
and is further supported by a 4-megabyte off-chip 
cache. Along with other hardware support, a 
K10000 processor can deliver up to twice the 
performance of a NonStop Cyclone processor. 

The designers of the K10000 server had to 
determine whether the existing IPB design (the 
one used on the NonStop Cyclone and NonStop 
Cyclone/R™ processors) could support the in- 
creased performance of the K10000 processor 
without becoming a bottleneck. To answer this 
question, they performed several modeling and 
simulation studies. The designers developed a 
detailed model of a K10000 system and stressed 
it with a variety of workloads. With these mod- 
els, they could predict various properties of the 
IPB subsystem as well as the overall throughput 
of the system under various conditions. 

The conclusions from the study were as 
follows: 


m The existing IPB design would be a system- 
performance bottleneck with processors of the 
anticipated performance. 


= The performance bottleneck could be allevi- 
ated by changing the processor’s interface to 
the IPB; the IPB itself did not have to be 
changed. 


= The Dynabus+™ architecture of the NonStop 
Cyclone systems would provide the highest per- 
formance, with four processors per section 
being the optimal configuration.' 


'The Dynabus+ architecture uses a dual fiber-optic ring to connect up to 
four sections of up to four processors per section. This architecture 
provided better performance than the flat. single-section architecture of the 
NonStop Cyclone/R systems. 


Changes in the K10000 IPB 


Guided by the study, the designers modified the 
processor’s interface to the IPB subsystem. The 
K10000 servers introduce a new level of perfor- 
mance in the IPBs, while retaining much of the 
design and many system components from pre- 
vious systems. The K10000 IPB subsystem in- 
cludes the following enhancements: 


= Direct memory access (DMA) mechanisms 
that transfer packets between main memory and 
the inbound and outbound interface packet 
queues. 


s Asynchronous (nonwaited) sending mecha- 
nisms. 


a Send chaining to link together multiple send 
requests into a single composite operation. 


= Multiple send channels per bus. 


Introduction to DMA 


The concept of DMA is simple: an agent other 
than the processor itself moves data between 
memory and, in this case, the IPB. Previous 
Tandem NonStop systems used DMA agents for 
I/O operations, but not for the IPB. The proces- 
sor’s microcode or millicode? performed the 
data transfer between memory and the IPB, thus 
taking processor cycles away from the execu- 
tion of processes and the NonStop Kernel. 

For processors based on RISC technology 
(such as the K10000), the cost of moving data 
by the processor is even more significant. RISC 
designs rely on high internal clock rates and 
large caches, so references to main memory and 
to external interfaces such as the Dynabus are 
relatively more costly. DMA engines for both 
sending and receiving data on the IPBs relieve 
the K10000 processor of these burdens, freeing 
up more execution cycles for other work. 


2Designers used the name millicode, which retains some of the flavor of 
microcode, to indicate that parts of the millicode software perform the 
same functions on RISC processors that the microcode did on previous 
Tandem NonStop CISC processors. The primary difference is that the 
microcode was programmed in a special-purpose language and operated 
below the TNS instruction-set level, whereas RISC millicode is a set of 
MIPS R3000 or R4400 instructions executing on the hardware platform, 
just as instructions in the NonStop Kernel do. 
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Asynchronous Sending 


The concept of asynchronous sending is often 
misunderstood; indeed, it is often confused with 
that of DMA transfers. The basic concept has to 
do with waiting. In previous NonStop systems, 
when an IPB send operation was performed, the 
processor had to wait for the send operation to 
be completed before it continued executing. The 
processor could not perform other work until the 
entire message had been split into packets and 
moved from memory into the Out Queue 
(OUTQ). 

Asynchronous transfer is not the same as 
direct memory access. In fact, NonStop systems 
have always had asynchronous receiving mech- 
anisms, even without DMA. These systems were 
interrupt-driven. When a packet arrived in the In 
Queue (INQ), the processor was interrupted, and 
millicode (or microcode) moved the data from 
the INQ into memory. However, between pack- 
ets, the processor resumed executing instruc- 
tions; it did not wait for the next packet. 

The NonStop Himalaya K10000 processors 
use asynchronous sending as well as DMA trans- 
fer engines for both sending and receiving data. 


Send Chaining 


Since the D00 version of the NonStop Kernel, 
a message on the NonStop system has consisted 
of two parts: a control part that describes the 
message and a data part that contains the actual 
data of the message. When a message is sent 
from one processor to another, the message 
system prefixes a setup section to the message. 
This section helps the receiving processor pre- 
pare to receive the rest of the message. (For 
some messages going to remote processors, the 
message system prefixes an additional section.) 
Thus, a typical message transmission over the 
IPB requires sending three separate, usually non- 
contiguous parts. 

In previous NonStop systems, each part 
would require a separate SEND instruction. 
To reduce the number of interrupts and other 
required overhead, the NonStop K10000 imple- 
mentation allows all parts of a message to be 
bundled together and described by a single chain 
of commands for the millicode and hardware to 
perform. 


Multiple Send Channels per Bus 


Often, the NonStop Kernel executing in one 
processor has messages that could be sent con- 
currently to several other processors (on the 
same system or remote systems). Furthermore, 
these messages frequently have different urgen- 
cies or other delivery requirements. 
Sometimes, in previous NonStop systems, 
an urgency inversion condition could exist. 
Assume, for example, that a low-urgency but 
long message was first started for one destina- 
tion, and that a high- 


urgency and shorter 
message had to be sent 
to a different destina- 
tion before the first 
send operation was fin- 
ished. Even if it used 
DMA and asynchronous 
sending, the NonStop 
Kernel would not be able to start the high- 
urgency send operation until the long send 
was completed. 

To alleviate this potential problem, the 
K10000 server implements multiple send chan- 
nels on each bus, which allow multiple send 
operations to be outstanding to different desti- 
nations simultaneously. In the preceding exam- 
ple, the long send operation no longer blocks 
the short, time-critical send operation, since the 
latter can proceed concurrently on a different 
send channel. 


engines. 
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Figure 2. 

Processor costs for inter- 
processor communication 
(sending and receiving 
same-sized messages) on 
the K10000, Cyclone, and 
K1000 processors. 


Performance Measurements 


Performance measurements show that the 
changes in the Himalaya K10000 IPB subsys- 
tem, summarized above, were sufficient to sup- 
port the performance capabilities of the K10000 
processor. The graphs in Figures 2 and 3 illus- 
trate these performance results. 


Figure 2 shows the processor time required 
to send and receive messages of various size on 
three processors: a Himalaya K1000, a NonStop 
Cyclone, and a K10000. Of the three, only the 
K 10000 processor has DMA IPB and asynchro- 
nous sending capabilities. The cost of transfer- 
ring data between processors on the K10000 is 
extremely low. Moreover, it is nearly constant 
for all message lengths. The Cyclone time, 
while quite low, shows that many processor 
cycles are consumed in transferring long mes- 
sages. The K1000 time shows that this effect 
is relatively worse on a RISC processor that 
has neither DMA nor asynchronous sending 
capability. 

Figure 3 shows the throughput for the same 
three processor types, again as a function of 
message size. For long messages, the K10000 
begins to approach the theoretical point-to-point 
throughput limit of the IPB, slightly over eight 
megabytes per second. (This limit applies only 
to a single processor-to-processor transfer; the 
maximum throughput for multiple senders is 
higher.) 

The message system performance shown 
in these figures is measured using LINKER, a 
development tool that measures the time (both 
processor and elapsed) needed for a round-trip 
message between a linker process and a listener 
process in different processors. Both processes 
use the privileged (link-level) interface to the 
message system. Therefore, the reported times 
include all of the operating system overhead 
(except for the file system) as well as any hard- 
ware transport delays that a message-based 
application might typically see. The size of the 
linker’s request and the listener’s reply can be 
varied independently. However, in the tests 
reported here, the message sent and the reply 
returned were of equal length, for each message 
size measured. 
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Architecture of the K10000 IPB 
Interface 


The IPB features introduced in the preceding 
sections are implemented within a three-layer 
hierarchy: a hardware layer, a millicode layer, 
and a software layer. The hardware provides the 
DMA transfer engines and supporting hardware. 
The millicode provides a layer of services that 
serve mainly to isolate the software from certain 
implementation details of the hardware, provid- 
ing a more abstract, machine-independent inter- 
face to the software. The software layer is the 
lower layer of the message system, a module 
within the NonStop Kernel operating system. 

The remainder of this article describes how 
these three layers provide the IPB interface for 
the K10000 processor. For information about the 
IPB in general and another perspective on the 
K10000 IPB implementation, refer to the NonStop 
Himalaya K10000 Server Description Manual 
(1993). 


Hardware Layer 


The hardware is the lowest of the three layers 
implementing interprocessor communication. 
In addition to providing data integrity, single- 
fault tolerance, and the other standard Tandem 
features, the hardware was designed to meet 
three major objectives: 


= Minimize the processor time costs associated 
with messaging. 
m= Maximize the efficiency of IPB bus use. 


m Eliminate certain sources of failures known 
from previous designs. 


Figure 3 
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Another goal was to lower the design time 
and costs by using some of the same compo- 
nents (outside of the processor) used on other 
NonStop systems. For example, the K10000 
uses the same FOX™ and Dynabus+ logic boards 
that are used in NonStop Cyclone systems. Fur- 
thermore, the various bus controllers, termina- 
tors, and FOX and Dynabus+ adaptors are 
functionally (although not physically) identical 
to their Cyclone counterparts. 


 — 
Figure 3. 

Throughput comparison 

for interprocessor com- 
munication (sending and 
receiving same-sized mes- 
sages) on the K10000, 

Cyclone, and K1000 

processors. 
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Figure 4. 
IPB send engine functional 
block diagram. 


Send Hardware 

To meet the design objectives for send opera- 
tions, the send hardware incorporates several 
key features. The hardware uses a command- 
chaining DMA engine controlled by the proces- 
sor through a mailbox located in main memory. 
The DMA engine performs send-data fetching 
and packet formatting. A multichannel send 
structure is implemented; the hardware has two 
independent DMA send channels per bus that 


multiplex on a packet basis into the send 
stream. The send channels are completely inter- 
ruptible and restartable at packet boundaries. 
Finally, the send hardware includes two traffic- 
shaping features, poll limiting and bandwidth 
limiting. Poll limiting prevents a sender from 
wasting bus bandwidth when the receiver is 
busy. Bandwidth limiting minimizes the like- 
lihood of a fast sender overrunning a slow 
receiver. 

Figure 4 shows the functional organization 
of the K10000’s send hardware for a single bus. 
The processor can perform a sequence of one or 
more sends on a single channel by describing 
those sends in a mailbox data structure in mem- 
ory, called a circular command buffer (CCB), 
which can contain up to 16 send segment com- 
mands. There is one mailbox per send channel. 
(Although a mechanism exists for a mailbox to 
chain multiple CCBs, the millicode layer cur- 
rently uses only one CCB per channel.) After 
the CCB is set up, the processor starts the DMA 
send hardware for that channel. 

The DMA send hardware, called the send 
control unit (SCU), fetches a single command 
from the CCB into its local storage (shown as 
SPBO or SPB1 in Figure 4) and then executes it. 
The single command can describe a send seg- 
ment of up to 16 kilobytes of contiguous physi- 
cal memory (though the current millicode limits 
segments to a single page frame of 4 kilobytes). 
The send command is executed by handling as 
many packets as needed. For each packet, the 
SCU fetches data from memory, formats it into 
an IPB packet, and places that packet into the 
OUTQ. Once the packet has been placed in the 
OUTQ without encountering any errors, the SCU 
updates its local working pointers so the next 
packet can be fetched. It then informs the other 
send hardware unit, the packet send unit (PSU), 
to send the packet already in the OUTQ. 
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The PSU operates in the IPB bus clock 
regime and follows the protocol set by the IPB 
bus controller. The PSU asks to send a packet 
when the bus controller permits it and sends the 
packet when the bus controller orders it to do 
so. After the packet is sent, the PSU informs the 
SCU whether it was sent successfully or given a 
NAK handshake by the receiver. (A NAK is a 
flow-control feature of the bus controller’s pro- 
tocol that causes the packet to be retransmitted.) 
The PSU does not retry NAKed packets on its 
own; the decision to retry is made by the SCU, 
allowing effective use of the traffic-shaping and 
multiple-channel features. 

When ail of the data in a single segment has 
been fetched, formatted into packets, and sent 
by the PSU, the SCU fetches the next command 
in the CCB mailbox. The new command is exe- 
cuted just as the first one was. This process con- 
tinues until the end of the chain of commands is 
reached. At the end of the chain, if so requested 
by the last command, the SCU posts a send- 
completion notification interrupt to the proces- 
sor. In addition, if the send should terminate 
abnormally, such as with an uncorrectable 
memory error (UCME) detected in the data 
buffer, the SCU posts an abnormal-completion 
interrupt to the processor. 

The SCU and PSU form a two-stage pipeline 
so that they can operate concurrently, fetching 
and formatting one packet while the previous 
packet is being sent. This pipeline is completely 
emptied at intercommand boundaries within a 
command chain. Therefore, the full performance 
benefit of this pipelined organization is not real- 
ized unless send segments are at least three or 
four packets long. With shorter messages, a sin- 
gle channel typically cannot send at the maxi- 
mum bus rate much of the time. 

Once the CCB is set up and the DMA engine 
is Started, the processor is not involved in send 
activities (except for performing some minor 
monitoring functions described later in the 


Millicode Layer section of this article). With 
the hardware structure as shown in Figure 4, the 
processor can set up and start a DMA transfer 
independently on each of the two send channels. 
Whenever both channels are actively processing 
through their command chains, the PSU can be 
kept busy at the maximum bus rate even if the 
{wo streams each consist of short segments. 
Thus, the dual send channels increase efficiency 
in addition to avoiding the time-critical block- 
ages described above. 

Though the structure of the send hardware 
makes maximum use of common hardware 
between send channels, some unique hardware 
cost exists for each channel. Since many chan- 
nel-use schemes could be optimal at the system 
level, it was hard to judge the value of the hard- 
ware expenditure for additional channels. It was 
decided that the best answer was to limit hard- 
ware costs to two channels (the minimum 
needed for concurrency, robustness, and maxi- 
mum performance) and allow the millicode to 
emulate as many channels as desired. To sup- 
port switching at a granularity smaller than a 
CCB, each channel is efficiently interruptible 
and restartable at packet boundaries. 
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Figure 5. 
IPB receive engine 
functional block diagram. 


It had been observed on the NonStop Cyclone 
and previous processors that a noticeable portion 
of packet transfers on the IPB were NAKed. Two 
major sources of NAKs were discovered: receiver 
INQ overruns due to long MUTEX (mutual exclu- 
sion) intervals and FOX INQ overruns due to 
small INQ size and slower fiber transmission 
rates. The two traffic-shaping features, poll lim- 
iting and bandwidth limiting, cannot speed up 
the receivers, but they do permit other senders 
to use those packet-transmit windows that would 
have been NAKed, so that, at the system level, 
IPB throughput is higher. 


Receive Hardware 


Figure 5 shows the functional organization of 
the IPB receive hardware for a single bus on the 
K10000. Receive operations underwent fewer 
conceptual changes on the K10000 than did 
send operations. The basic change was the use 
of DMA for receive data. This change required 
moving the packet format-checking and data- 
extraction processes into hardware, which in 
previous systems had been handled by the 
microcode (or millicode). Moving those pro- 
cesses strongly suggested creating a hardware 
copy of the bus receive table (BRT), the archi- 
tectural table that describes receive buffers. It 
was decided, therefore, that millicode would 
encapsulate the BRT and use the IPB hardware 
as its primary storage. Some hardware support 
had to be added to provide the processor with 
atomic access to the BRT. (See the CPU access 
agent in Figure 5.) DMA for receive operations 
(hardware format-checking) permits both buses 
to receive data into memory simultaneously, 
whereas in previous systems the processor could 
perform format checking on received packets 
from only one INQ at a time. Further, the K10000 
pipelines operations so that packet reception and 
packet checking can execute concurrently. 

The hardware performs high-bandwidth 
packet checking by using a dedicated and spe- 
cialized programmable microcontroller called 
the format check unit (FCU). The working logic 
in the FCU is designed specifically to support 
IPB receive functions. It includes a route-word 
checker, a sequence-number checker, a check- 
sum checker, and other checking functions. The 
FCU is controlled by an extensible microcon- 
troller with writeable control store. Since packet 
format checking is complex, and it is this process 
that interfaces with the millicode layer, design- 
ers considered the flexibility of the FCU to be 
important. The FCU design achieves flexibility 
without sacrificing performance. The main 
microcode loop that handles most packets was 
optimized to process packets about 50 percent 
faster than the maximum rate at which they can 
arrive into the INQ. 
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After the FCU strips out the formatting in- 
formation from each packet, the memory write 
unit (MWU) transfers the packet data into mem- 
ory. While the MWU is performing the DMA 
for a packet, the FCU can proceed to handle the 
next packet concurrently, so that a high receive 
streaming rate can be maintained. 

Meeting the performance goals of the FCU 
requires quick access to INQ and BRT data. This, 
in turn, requires that addresses in the BRT be 
physical instead of virtual. (There is no time to 
translate them.) Therefore, the hardware bus 
receive table (HBRT) is made an extension of 
the BRT structure. It includes space for the 
physical addresses of all pages touched by the 
receive buffer. The millicode sets up this infor- 
mation when the transfer is enabled. 

Receive interrupts in a K10000 normally oc- 
cur at most once per datagram (a sequence of 
consecutively numbered packets), rather than 
once per packet. Although far fewer receive in- 
terrupts are handled than on previous processors, 
care was taken to minimize the processor costs 
of these interrupts. The IPB hardware optimizes 
interrupt encodings so that the most frequent 
receive interrupts require the least expensive 
processor activity. 

In the K10000, the format-checking and 
packet-arrival processes are given full concur- 
rent access to the INQ. The packet lift unit (PLU) 
can place packets in the INQ independently of 
and simultaneously with the FCU’s INQ activity. 
The INQ has high enough bandwidth to support 
multiple FCU accesses while picking up packets 
at the maximum IPB rate. The INQ size is also 
programmable up to 256 packets, so that over- 
run conditions can be made less likely. (Milli- 
code currently uses a 16-packet INQ.) 

Finally, the K10000 includes some additional 
checks on the proper functioning of the IPB pro- 
tocol. These checks allow for earlier detection 
of and smoother recovery from errors that occur 
on the bus, including, for example, the loss of 
bus clocks due to the failure of or removal of a 
bus controller or terminator. 


Processor-memory bus 


ali 


Physical Implementation 


Figure 6 shows the physical implementation of 
the K10000 IPB subsystem. The subsystem con- 
tains two application-specific integrated circuit 
(ASIC) chips, one for each bus. These share a 
small, fast memory comprising static RAM 
chips. This memory, called the bus receive 
RAM (BR RAM), holds both X and Y INQs, the 
BRT, and the FCU microcode. Finally, discrete 
driver and receiver chips provide the electrical 
interface to the IPB itself. 


X Bus 

—> 
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EE 
Figure 6. 

IPB subsystem hardware 

block diagram. 
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Millicode Layer 


The Himalaya K10000 IPB millicode is a layer 
of software between the IPB hardware and the 
NonStop Kernel message system. The design of 


the millicode layer had three primary objectives: 


= Provide an effective method for the message 
system to use the advanced features provided 
by the K10000 IPB hardware for sending and 
receiving data over the IPBs. 


ws Hide as many nonessential implementation 
details of the K10000 processor hardware and 
IPB hardware as possible. 


= Achieve the preceding objectives with a min- 
imum of overhead. 


Some essential features exposed to the mes- 
sage system are the existence of multiple send 
channels per bus, the asynchronous sending 
capability, and the encapsulation of the BRT. 
The interface between the message system and 
the millicode was designed so that it could be 
used for a wide variety of future IPB designs 
that have the same general characteristics. 

Examples of nonessential implementation 
details hidden from the NonStop Kernel are the 
cache-implementation and cache-coherency 
protocols, the hardware-register formats, and 
the addressing and accessing protocols of these 
registers. 

The interface to the NonStop Kernel com- 
prises a set of privileged millicode routines that 
the kernel can call (for example, to initiate or 
queue a send operation) and an interrupt inter- 
face (which is a minor extension of the existing 
BUSRECEIVE interrupt interface on previous 
NonStop systems). 


Millicode Send Operations 

The millicode enables the software to take 
advantage of asynchronous sending by provid- 
ing a queue-driven interface to the send hard- 
ware. A new data structure, the send information 
block (SIB), carries all the information about a 
send operation (or set of related sends) across 
this interface. 

The fundamental operation of starting a send 
was previously performed by the execution of 

a SEND instruction. Instead, the message sys- 
tem now constructs a SIB describing the send 
to be performed and calls a millicode routine, 
Enqueue_Send, to request the initiation of the 
send operation. This millicode routine never 
waits for that send (or any earlier send) to 
finish. Instead, if the send can’t be initiated 
immediately, the SIB is placed in one of sever- 
al queues for later processing. In either case, 
whether the send is initiated immediately or 
not, the Enqueue_Send routine returns control 
to the caller without waiting. 

The millicode provides an efficient routine, 
Send_ Status, that the NonStop Kernel can use 
to check on the progress of a send operation. 
For cases in which the kernel must override 
the usual K10000 practice of not waiting for a 
send to complete, the millicode provides the 
Wait_on_Send routine to do so. To cancel 
an in-progress send, the kernel can call the 
Cancel_Send routine. In all of these cases, 
the SIB is a parameter to the routine. 

As described previously, the K10000 IPB 
hardware can chain send commands (and send 
buffers) together for greater efficiency. A lim- 
ited form of this feature is exposed to the 
NonStop Kernel by allowing information about 
multiple sends to be grouped together in a sin- 
gle SIB. This allows the message system to 
group together frequently related sends such as 
the setup packets, the user’s request control, 
and request data packets. Significant efficiency 
is gained by grouping these together in a single 
SIB. The millicode keeps these operations to- 
gether as it links the SIB into queues, converts 
the SIB information into hardware commands, 
and initiates and tracks the send operation’s 
progress. In most cases, the millicode uses the 
hardware send-chaining feature to initiate all 
the send areas described in a SIB with a single 
operation. 
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As mentioned earlier, the K10000 server pro- 
vides multiple send channels per bus. The ideal 
number of send channels is at least four (for a 
system with FOX or TorusNet™ connections), 
because this number allows the splitting of 
local and remote sends on separate channels 
and also allows for a time-critical send channel 
separate from the normal channels. However, 
gate limitations within the IPB ASICs limited 
the number of separate send channels to two 
per bus. Thus, designers decided to provide the 
illusion of having four separate send channels 
per bus to the message system. The millicode 
accomplishes this by maintaining a dynamic 
mapping between the four channels seen by the 
software and the two channels provided by the 
hardware. 

The IPB millicode maintains a queue of SIBs 
for each of the four send channels (per bus) that 
it provides. At any given time, for each bus, 
two queues are mapped to the two hardware- 
provided send channels, while the other two 
queues are pending. (That is, the two corre- 
sponding emulated channels are not mapped.) 
Whenever a send (SIB) is completed or a new 
send (SIB) is initiated, the millicode reevaluates 
the mapping to achieve the maximum sending 
concurrency, while maintaining the priority 
order of the queues. 

When a SIB is placed into an empty queue 
that is mapped to a send channel, or when a SIB 
reaches the head of a mapped queue because 
earlier SIBs have been completed, the millicode 
initiates sending the data described by that SIB 
by converting the information in the SIB into 
hardware commands placed in the channel’s 
CCB. At this time, cache flushing of the send 
areas is performed (because the R4400 proces- 
sor’s write-back cache may contain copies of 
data newer than those in memory), and the mil- 
licode handles packets that cross page bound- 
aries. (Designers determined that it would be 
more expedient and cost-effective to have the 
millicode copy page-crossing packets to sepa- 
rate memory areas than to have the hardware 
send channels deal with page crossings in 
midpacket.) 


Care was taken to minimize the number of 
IPB interrupts that the software must handle. 
One method has been described: the grouping 
of related send operations together into a single 
SIB at the NonStop Kernel-millicode boundary 
and the grouping of send commands together in 
a single CCB at the millicode-hardware bound- 
ary. In addition, the software takes advantage of 
the fact that in most circumstances, it does not 
need explicit notification of send-completion 
events. 

Under the normal, relatively light IPB loads 
that are typical of OLTP applications, the send 
channels are idle when 


each send is started. In 
these cases, no interrupt 
is generated when a 
send is completed. 
Instead, the channel’s 
status is checked when 
each new send is initi- 
ated; only at that time does the millicode no- 
tice the completion status of the previous send 
operation. 

However, when traffic is heavy, queues may 
temporarily form. In these cases, it is preferable 
to incur the expense of handling an interrupt to 
get the next send started at the earliest opportu- 
nity, rather than use some time-based status 
polling. The hardware optionally generates 
send-completion interrupts, a facility the milli- 
code uses only when its queues are not empty. 
Similarly, the millicode optionally generates 
send-completion interrupts, a facility the mes- 
sage system uses only when its queues are not 
empty. 

The millicode periodically monitors the 
progress of a send operation according to a 
parameter supplied in the SIB. If a send is not 
making progress within the time prescribed, it 
is cancelled and a send-timeout interrupt is gen- 
erated. (The millicode then initiates the next 
queued send, if any.) 


he millicode enables the 
software to take advantage 
of asynchronous sending. 
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Millicode Receive Operations 

As discussed earlier, the BRT has changed sig- 
nificantly in the K10000 server. In previous 
NonStop systems, the BRT was a simple data 
structure in main memory, initialized by the 
NonStop Kernel and updated by the microcode 
(or millicode) on each packet reception. How- 
ever, with the advent of a DMA receive engine, 
the overhead and coordination problems of 
keeping the BRT in main memory were unac- 
ceptable. Instead, designers decided to encapsu- 
late the BRT, allowing only controlled access 
to update it, so that the hardware could keep its 
own copy of the BRT information in fast RAMs 
and registers. 

The millicode performs this encapsulation by 
providing a set of routines to the message sys- 
tem to access and update the BRT, while hiding 
the details of the hardware BRT and its access- 
ing protocols. 

To coordinate the sharing of the BRT between 
the software and hardware, each entry in the 
BRT (corresponding to a single source proces- 
sor) is considered to be controlled either by the 
hardware or by the software. This distinction is 
actually implemented by an enable bit within 
the BRT entry. When the enable bit is on, the 
entry is controlled by the hardware, and the 
DMA engine is free to use the entry to transfer 
packets to memory from the corresponding 
source processor, updating the BRT information 
appropriately. When the enable bit is off, the 
entry is controlled by the software, and the 
DMA engine must not transfer packets to mem- 
ory from the corresponding source processor, 
nor may it alter the entry. The message system 
uses five millicode routines (Disable_BRT, 
Enable_BRT, Read_BRT, Read_and_Disable_ 
BRT, and Write_and_Enable_BRT) to read, 
write, and transfer control of the BRT entries. 


In addition, the millicode provides two inter- 
face routines to allow the message system to 
turn the DMA receive engines on and off. Recall 
that this function was previously performed by 
the interrupt masks; when IPB interrupts were 
disallowed, transfer from the INQs to memory 
was also disallowed. However, more concur- 
rency is achieved if these transfers can take 
place even while the processor is under MUTEX 
for some reason unrelated to the IPB opera- 
tion. Therefore, this function is now separate- 
ly controlled by the two millicode routines, 
Set_Receive_Enable and Reset_Receive_Enable. 
The X and Y buses can be controlled separately. 
In addition, when the hardware completes an 
IPB reception, the receiving DMA engine always 
disables itself until the software has had a chance 
to process the completion interrupt. 


Message System Layer 


The message system is the component of the 
NonStop Kernel that provides messaging ser- 
vices to the rest of the kernel, to system pro- 
grams, and, through the NonStop Kernel file 
system, to applications programs (Chandra, 
1985). It provides a request-reply message 
interface to processes running on the same 
processor or different processors in a Tandem 
NonStop system connected by the Dynabus. It 
also provides the same message protocols to 
processes running on remote systems intercon- 
nected by the FOX or TorusNet fiber-optic net- 
works or by Tandem’s Expand™ networking 
software, which supports a number of physical 
interconnect media. 

The message system is the highest layer of 
the IPB subsystem described in this article. It 
communicates with the IPB hardware (the low- 
est layer) through routines supplied by the mil- 
licode (the middle layer). Of the three layers 
discussed here, the message system has under- 
gone the least amount of change. Its highest 
sublayer, interfacing to its clients, has not 
changed at all, while its lowest sublayer, inter- 
facing to the hardware, was entirely rewritten 
to use the new millicode routines. 
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interface Changes 


Because the message system’s high-level inter- 
faces to its clients are entirely unchanged, client 
software does not need to be modified, in spite 
of the significant changes in the IPB subsystem. 
The main clients of the message system are the 
NonStop Kernel file system, TMF™ (Transaction 
Monitoring Facility), the Input Output Process 
Request Manager (IOPRM), privileged I/O and 
communications processes, and privileged user 
applications. 


On earlier processors, calls to the MSG_LINK_ 
and MSG_REPLY_ routines waited (with some 
exceptions) until the entire message was placed 
in the IPB hardware OUTQ. For messages larger 
than the OUTQ size, this implied that the proces- 
sor idled without executing useful instructions 
until all of the message (except the last OUTQ 
worth) was sent out on the IPB. When a proces- 
sor idles, waiting for something to happen, it 
is said to be busywaiting. On the K10000, the 
MSG_LINK_ and 


A few NonStop system processes use the IPB = MSG_REPLY_ calls lient software does not 
to communicate with processors that are “down” queue the request to the dtob dified 
(not running the NonStop Kernel) and cannot millicode and return im- need to be moaitied, 


use the request-reply protocol provided by the mediately without wait- in spite of the Significant 
message system. Such processes include the ing. Waited calls such as 


RELOAD utility, which loads the operating sys- MSG_READCONTROL_ chang es in the IPBs. 


tem image to down processors and brings them 
up; the RCVDUMP utility, which receives an 
image of the memory from a halted processor; 
the Tandem Maintenance and Diagnostics 
System (TMDS); and the IPBMON process, the 
system process that controls and monitors FOX 
and TorusNet. Before the K10000 IPB changes, 
these processes used the SEND NonStop proces- 
sor instruction to send data on the bus. They 
also directly manipulated the BRT before re- 
ceiving data on the bus. With the K10000 IPB 
changes, the message system has encapsul- 

ated the IPB for its low-level clients through 
two new processor-independent routines, 
XMIT_PROC_ and UPDATE_BRT_. The 
XMIT_PROC_ routine sends data on the bus 
and the UPDATE_BRT_ routine sets up a BRT 
entry before receiving data on the bus. 


Major Features of the K10000 

Message System 

The message system uses most of the new facil- 
ities provided by the K10000 IPB hardware and 
millicode. The main features of the K10000 
message system that differ from those on ear- 
lier processors are described in the following 
paragraphs. 


Asynchronous Sending. Sending a message does 
not block the processor from doing other work. 
The message is queued to a send engine that 
asynchronously sends the message. 


and MSG_READDATA_, 

which require data trans- 

fer from the requester processor, do not behave 
differently on the K10000 processor than they do 
on previous processors. 

Asynchronous sending results in better proces- 
sor utilization. For an IPB-bound application, the 
improvement is dramatic. Figure 2 shows the 
differences in processor utilization between 
the Himalaya K10000, Himalaya K1000, and 
NonStop Cyclone processors. 


Multiple Buffers per Send Operation. As dis- 
cussed earlier, a typical message sent over the 
IPB has three noncontiguous parts: the setup, 
control, and data areas. Since the SEND instruc- 
tion on earlier processors could send only one 
buffer at a time, multiple SEND instructions were 
required to send a message over the IPB. With 
the synchronous IPB interface on those proces- 
sors, the multiple sends did not increase the 
message overhead appreciably, but with the 
asynchronous IPB interface on the K10000, mul- 
tiple sends would require extra send-completion 
interrupts. The interrupt overhead is eliminated 
by describing the separate send buffers in a 
single SIB and using a single millicode 
Enqueue_Send operation. 
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Multiple Send Channels. For each bus, the IPB 
millicode provides the message system with 
four send channels with different send priori- 
ties. The two channels with highest priority are 
used to send waited messages generated by the 
NonStop Kernel (such as the JamAlive mes- 
sages exchanged by all NonStop processors 
every 1.2 seconds). The two channels with low- 
est priority are used to send user messages. 


Encapsulated BRT. The BRT is not shared by the 
IPB millicode (or microcode) and the NonStop 
Kernel. It resides in the IPB hardware and is 
encapsulated for the message system by the 
millicode. The message system uses millicode 
routines to read and update the BRT. 


Separate Mechanism for Disabling Buses. 
Separate mechanisms are used to disable bus 
receive software interrupts and transfers from 
the IPB INQ into memory. On earlier proces- 
sors, a single interrupt mask bit served both 
purposes. On the K10000, millicode routines 
are called to stop and restart IPB transfers into 
memory. This enhancement means that bus 
receptions are not stopped when interrupt han- 
dlers (except the BUSRECEIVE interrupt han- 
dler) are running and when interrupts are 
disabled. 


DMA Sending and Receiving. Both send and 
receive engines use DMA to access the data buf- 
fers. The DMA engines use physical addresses 
to access memory, while the message system 
clients work with virtual addresses. The mes- 
sage system uses a memory manager routine 

to translate the virtual addresses to page-frame 
numbers before invoking the DMA engines. 


Send Implementation 


The two primary changes in the message sys- 
tem send implementation are the use of asyn- 
chronous sending and the use of multiple send 
channels. 


Use of Send Channels. As stated earlier, the 
K10000 IPB millicode provides four send chan- 
nels per bus, two for local transmissions and 
two for remote transmissions. The channels are 
assigned different send priorities, with lower 
channel numbers having higher priority. 

The message system uses channel 0 exclu- 
sively for sending unacknowledged datagrams 
to other processors on the same NonStop sys- 
tem. These datagrams include time-critical 
unsequenced IPB packets (with the special 
sequence number of -1) such as the JamAlive 
packets, regroup and poison packets (sent when 
a processor is slow to respond or has been 
declared down), and datagrams sent using the 
XMIT_PROC_ interface by system utilities such 
as RELOAD and RCVDUMP. All sends on chan- 
nel 0 are waited. The processor busywaits until 
the transmission is completed by calling the 
Wait_on_Send millicode routine. (The proces- 
sor can service interrupts while busywaiting, 
but it cannot run processes.) 

The message system uses channel | exclu- 
sively for sending unacknowledged datagrams 
to other processors over the FOX network or 
TorusNet. These include time-critical unse- 
quenced handshake packets exchanged by 
IPBMON processes when establishing or modi- 
fying remote connections, periodic idle packets 
exchanged to preserve these connections, and 
packets sent to the FOX or TorusNet controller 
board, the local bus unit (LBU). These send 
operations also cause the processor to busywait 
until the transmission is completed. 

Channels 2 and 3, the low-priority channels, 
are used to send acknowledged messages (all 
user messages and most NonStop Kernel mes- 
sages) and their acknowledgements (which are 
sent as unsequenced packets). The message sys- 
tem uses channel 2 to send to other processors 
on the same system, and channel 3 to send to 
remote processors on the network. These mes- 
sages are queued to the millicode and do not 
cause busywaiting. 
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Limiting Millicode Queue Lengths. The message 
system supports a transmission window of four 
outstanding unacknowledged transmissions 

to each destination processor. On a FOX or 
TorusNet system, if there were no other limit- 
ing factor, the millicode send queues could 
contain up to 892 such outstanding messages 
(60 to local processors and 832 to remote pro- 
cessors). Although the millicode does not re- 
strict the size of the message queues for the 
send channels, there are practical reasons to 
limit the queue lengths. 

A potential problem with allowing send 
queues to contain up to four SIBs to each desti- 
nation is that the time to receive acknowledge- 
ments for the outstanding transmissions could 
get very long. The message system uses a time- 
out mechanism for these acknowledgements to 
detect lack of progress of a message. The time 
allowed for the timeout must be significantly 
greater than the elapsed time to get an acknowl- 
edgement back under heavy IPB message traffic. 
Allowing the queues to hold hundreds of SIBs 
would require such a long timeout that it would 
delay the timely detection of real problems. 

Long millicode queues would also consume 
large amounts of memory. Each queue element 
is a SIB, a fairly large data structure. 

To avoid both problems, the K 10000 message 
system implementation limits the length of a 
nowaited channel queue to one sequenced mes- 
sage and one unsequenced acknowledgment 
packet. On each processor, one sequenced and 
one unsequenced SIB are preallocated for each 
channel on each bus. These SIBs are used for 
sending messages to all destinations using that 
channel. 


Sending a Message. When a message is to be 
sent to another processor, the message system 
picks the current bus (X or Y) and a channel 
(based on whether it is a local or remote mes- 
sage). Then it checks to see if the SIB for the 
selected bus and channel is free by calling the 
Send_Status millicode routine. If the SIB is 
free, the message system copies the send infor- 
mation into it. The SIB contains multiple send 
areas to describe the multiple buffers that con- 
stitute a message going over the IPB. 


The message system then calls the 
Enqueue_Send millicode routine to initiate the 
send operation. A single millicode operation 
is sufficient to send the multiple buffers form- 
ing a message. The millicode places the SIB 
in a queue for the channel. When the channel 
becomes free, the millicode commands the 
send channel to begin sending the data associ- 
ated with the SIB at the head of the queue. 

If the Send_Status routine indicates that the 
SIB is in use (either being transmitted or await- 
ing transmission), the message system requests 
the millicode to invoke 


a special interrupt when 
the transmissions asso- 
ciated with the SIB are 
completed. The mes- 
sage is then placed at 
the tail of a queue of 
messages waiting for 
the SIB to become free. When the SIB-com- 
pleted software interrupt is invoked (through 
the BUSRECEIVE interrupt vector), the interrupt 
handler removes the message at the head of 

the queue of messages waiting for the SIB and 
arranges to have the message sent. 


Receive Implementation 


There are two primary changes in the message 
system receive implementation. First, the BRT 
resides in the IPB hardware and is encapsulated 
by the millicode using a procedural interface. 
A new message system routine, UPDATE_BRT_, 
was created for the NonStop Kernel to update 
the BRT. This routine invokes the millicode 
BRT procedures. 

Second, in normal operation, bus reception 
(transferring data from the IPB INQ into mem- 
ory) is disabled only while processing the com- 
pletion of a message reception. In previous 
processors, IPB data transfer was also disabled 
when BUSRECEIVE software interrupts were 
disabled. 


he millicode provides the 
message system with four 
send channels per bus. 
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Receiving a Message. As indicated above, on 
the K10000 server, the BRT resides in the IPB 
hardware, and the message system uses milli- 
code routines to access it. To read a BRT entry, 
the Read_and_Disable_BRT millicode routine is 
called if the entry is enabled and the Read_BRT 
routine if it is already disabled. To update a BRT 
entry, the message system disables the BRT entry 
(unless it is already disabled or the buses are 
both disabled) and updates it by calling the 
Write_and_Enable_BRT millicode routine. 

When all packets of a sequenced transmission 
are received from a source processor on a partic- 
ular bus, the IPB hardware turns off receptions 
on that bus and disables the BRT entry for the 
source processor. The hardware then posts a 
millicode interrupt. The millicode interrupt han- 
dler sets up the interrupt parameters and invokes 
the BUSRECEIVE software interrupt handler. 
BUSRECEIVE processes the received message 
and sets up for the next message to be received 
from the same source processor. It then updates 
and reenables the BRT entry by calling the 
Write_and_Enable_BRT millicode routine and 
reenables receptions on the bus by calling the 
Set_Receive_Enable millicode routine. 

When an unsequenced packet is received on 
a bus, no BRT entry is used and hence no BRT 
entry is disabled. The hardware turns off recep- 
tions on the bus and interrupts the millicode, 
which in turn invokes the BUSRECEIVE inter- 
rupt handler. BUSRECEIVE processes the packet 
received, but does not update the BRT. It then 
reenables receptions on the bus by calling the 
Set_Receive_Enable millicode routine. 


BUSRECEIVE can also be invoked in the case 
of receive errors (such as checksum errors or 
sequence errors), bus errors (such as bus-con- 
troller parity errors), and some send errors (such 
as send timeouts). Some of these errors do not 
cause BRT entries to be disabled, but others do. 
Through special bits in the interrupt parameter 
words, the millicode informs BUSRECEIVE if 
the BRT entry or bus has been disabled. In these 
error cases, BUSRECEIVE enables the BRT entry 
or the bus only if necessary. 

Thus, bus reception is turned off by the hard- 
ware when message reception is completed on 
a bus and reenabled at the end of BUSRECEIVE 
processing for that reception. Bus reception and 
BRT entries are not disabled by the MUTEX 
facility or by entry into software interrupt han- 
dlers, as was done on previous processors. The 
only time reception on a bus is turned off by 
the NonStop Kernel is when it brings down a 
bus because too many receive errors (such as 
checksum errors) occurred. To do this, it calls 
the Reset_Receive_Enable millicode routine. 
When this happens, the error condition should 
be corrected. An operator can then reenable 
receptions by using the TACL”™ (Tandem 
Advanced Command Language) XBUSUP or 
YBUSUP command. 


BRT Accesses Outside of BUSRECEIVE. When a 
system is coldloaded, all BRT entries are initial- 
ized to expect a setup message with a packet 
sequence number of zero. The BRT is also 
accessed by Tandem system processes such as 
RCVDUMP and IPBMON to set up receptions 
from down processors (processors not running 
the NonStop Kernel). The RCVDUMP process 
uses the BRT to set up memory-dump reception 
from a halted processor and the IPBMON process 
uses it to receive messages from the FOX or 
TorusNet controller board, which is addressed 

as a special processor on the network. 
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To simplify updating a BRT entry, the mes- 
sage system now provides the UPDATE_BRT_ 
routine, which takes as parameters the BRT 
entry values to be updated. The routine works 
on all hardware platforms. On the K10000, it 
disables the hardware BRT entry and then calls 
the Write_and_Enable_BRT millicode routine. 
UPDATE_BRT_ is called by the processor ini- 
tialization code to initialize the BRT when the 
processor is coldloaded or reloaded. It is also 
called by the RCVDUMP and IPBMON processes, 


Error Packet Reception Areas. When the DMA 
engine receives an unsequenced IPB packet or a 
packet with an error, the packet is written into a 
special memory buffer called the error packet 
reception area. On the K10000, when one bus is 
disabled due to an error packet reception, the 
other bus is still enabled and can receive pack- 
ets and detect errors. Therefore, a separate error 
packet area is provided for each bus. (In previ- 
ous processors, a single error packet reception 
area was sufficient because reception on both 
buses was disabled when the error packet area 
was filled by either bus.) 


Oe es ee 
Conclusion 


The NonStop Himalaya K10000 server provides 
higher performance than any previous Tandem 
system while maintaining compatibility with all 
existing Tandem user applications. Its Dynabus 
subsystem was designed to provide high perfor- 
mance so that the K10000 processor can deliver 
its full potential, not only in the OLTP applica- 
tions Tandem systems have traditionally sup- 
ported, but also in batch and decision support 
applications, and in combinations of these 
applications. System performance measure- 
ments have verified that these design goals 
have been met. 
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Client/Server Availability 


a 
herever computers are 
used, loss of user time 
through the failure of 
a hardware or soft- 
ware component has 
always been a signifi- 
: cant concern. Ten 
years ago, when the most common configura- 
tion was a set of terminals connected to a main- 
frame, computer systems and their interconnec- 
tions were usually fairly simple, and analyzing 
the causes and costs of user downtime was rela- 
tively straightforward. 

The current world of client/server computing 
is far more complicated. In a client/server envi- 
ronment, individual workstations on a local area 
network (LAN) may be connected to multiple 
servers and possibly a mainframe. A single net- 
work may involve several types of transmission 
media, such as Ethernet and fiber-optic cable, 
and require multiple bridges, routers, and other 
network devices. Successful execution of a sin- 
gle database query often requires that five or 


more pieces of equipment and the software that 
runs on them all be operating properly. In such 
an environment, it can be difficult to analyze all 
possible types of failures and their costs in user 
downtime. This, in turn, makes it difficult to 
arrive at conclusions about probable weak points 
in a client/server environment and the most 
effective ways to reduce downtime. 

This article discusses client/server availabil- 
ity and ways of reducing user outage times. ! 
User outage time is the amount of time that 
individual users in a client/server environment 
would be expected to lose due to outage causes 
of all types, whether at the level of a single 
client, a server, or an entire network. The 
article consists of two main parts. Part 1, 
“Client/Server Availability Model,” presents 
a predictive model for evaluating client/server 
availability and applies the model to a represen- 
tative client/server environment. It provides out- 
age statistics covering many types of failures that 
can affect hardware and software components. 
A significant finding is that without fault toler- 
ance, servers account for more user outage 
minutes than any other single factor. Part 2, 
“Improving Availability in a Client/Server 
Environment,” beginning on page 32, discusses 
ways of increasing client/server availability 
through operational improvements, greater 
fault tolerance, and other means. 


'For a longer version of this article, with a list of references on the topic of 
availability and sources for outage statistics. see NonStop Availability in a 
Client/Server Environment (Wood. 1994). 
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Table 1. 


Part 1: Client/Server Availability Model Relationship between percentages and annual outage minutes. 


: Availabili 90% 99% 99.9% 99.99% 9.999% 
Given the hardware of ten years ago, a complex BL 


; ! inutes* 0,000 ,00 500 50 5 
client/server environment would probably be ve ie inne iil ed 


down as often as it was up, and an analysis of “For purposes of illustration, maximum availability is treated as 500,000 minutes, rather than the full 


availability would primarily be concerned with 
hardware outages. However, hardware reliabil- 
ity has improved dramatically in the last decade, 
and the analysis of availability must give equal 
consideration to many other causes of outages, 
such as software bugs, operator error, and power 
failures. Planned outages must also be included 
in an analysis, since scheduled maintenance peri- 
ods are being eliminated as the need for continu- 
ous availability of business-critical applications 
increases. 


Defining and Measuring Availability 
Availability, classically defined as 


Availability = 
system uptime 
system uptime + system downtime 


is a standard system performance measure. 
Availability is usually expressed as a probability 
or percentage, for example, 99.9 percent avail- 
ability, but this metric can be difficult to use or 
envision. Unavailability, or outage time, defined 
as J — availability and expressed as annual 
downtime, is more useful. For example, a design 
change that increases availability from 99.9 per- 
cent to 99.99 percent may not appear very sig- 
nificant. In contrast, expressing the change as 

a decrease in downtime from 500 minutes per 
year to 50 minutes per year does appear signifi- 
cant and better conveys the true impact of the 
design change on business productivity. Table | 
shows the relationship between percentages and 
annual outage minutes. 

In a distributed client/server environment, it 
is not very meaningful to define availability in 
terms of system failure and to measure it in 
terms of system uptime and system downtime. 
If a single PC fails, the user of the PC cannot 


525,600 minutes of a calendar year. 


access the application, but all other users can 
continue operating without any loss of perfor- 
mance (unless the PC has failed in a mode that 
causes an outage for other users.) If 1 out of a 
1000 users cannot access the system, is the SyS- 
tem down? If not, what about 10 out of a 1000 
or 100 out of 1000? 

To accurately reflect the impact and scope of 
an outage, it is important to measure client/server 
availability from a user point of view rather than 
a system point of view. Such a measure needs 
to include both the duration of an outage and 
the number of users affected. Accordingly, the 
predictive availability model presented in this 
article is concerned with user availability, 
defined as 


User availability = 


user uptime 
user uptime + user downtime 


where user uptime is time during which users 
can effectively use their applications. If an aver- 
age of | user out of a 1000 cannot effectively use 
needed applications, that is equivalent to 99.9 
percent user availability or, in terms of Table 1, 
an average of 500 annual outage minutes per 
user. For the remainder of this article, user avail- 
ability is expressed in terms of annual user out- 
age minutes. Measuring client/server availability 
as user availability and user outage time avoids 
making an artificial decision about the number 
of down users that is equivalent to a system 
outage. 
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Table 2. 

Definitions of outage categories. 

Outage 

category Definition 

Physical Physical faults or failures in the hardware. Any type of hardware component failure 
belongs in this category. 

Design Bugs in design and design failures in hardware and software. An example is an 
application change that introduces unexpected problems. 

Operations Errors caused by operations personnel and users, due to accidents, inexperience, 


defective procedures, or malice. 


Environmental 


Power or cooling system failures, failure of external network connections (for 
example, a leased-line outage), natural disaster (earthquake, flood), accidents, 
terrorism. 


Reconfiguration 


Any planned outage. Examples include downtime required for planned maintenance, 
such as upgrading hardware or software, and configuration changes, such as adding 


a new disk or restructuring a database. 


Client/Server Outage Data 


An effort was made to collect as much 
client/server outage data as possible. In the 
process, it was surprising to find that very few 
companies keep complete client/server outage 
data. Most companies approached did not keep 
outage data of any type. A few companies had 
data limited to network outages, and a few had 
data on server outages. None kept data on indi- 
vidual client outages. The best source of data 
was a company with a large networking appli- 
cation and about 15,000 users. The company 
keeps detailed records on all outages that affect 
four or more users and provided us with their 
entire set of outage records for 1992. 

All the data collected was categorized accord- 
ing to five types of outage: physical, design, 
operations, environmental, and reconfiguration. 
Table 2 lists the outage categories and their def- 
initions. Table 3 gives examples of outages in 
each category. 


The wide variety of outages in Table 3 
reflects the complexity of the client/server envi- 
ronment. Note the potentially disastrous effect 
some of the outages could have on a business, 
especially the building transformer that caused 
1027 users to be down for 575 minutes, resulting 
ina total of 590,525 user outage minutes. 

One of the most interesting findings to emerge 
from the outage data is the difference between 
the actual causes of client/server failure and 
users’ perceptions of those causes. When people 
think of failures, they tend to think of physical 
hardware failures, but according to the outage 
data, physical failures cause only 10 to 20 per- 
cent of unplanned outages. Design, operations, 
and environmental failures are all as common 
as, or more common than, physical failures. 
Planned outages (reconfiguration) are also more 
common than physical failures. Another inter- 
esting point is that client/server users tend to 
blame all outages on the LAN. However, the data 
shows outages of non-fault-tolerant servers to be 
the primary source of user outage minutes (see 
the discussion under “Fault-Tolerant Servers” 
later in the article). 


Outage Causes 

The complexity of a client/server environment 
introduces a wide range of new failure mecha- 
nisms. To establish a list of outage causes and 
obtain outage times for the availability model, 
outage data was augmented with literature sur- 
veys (for example, Caginalp, 1992; Caldwell, 
1992; Saal, 1990), articles quoting downtimes 
(for example, Bowen, 1992), university studies 
(for example Feather, 1992; Lee et al., 1993), 
quotes on hardware reliability from vendors, 
and internal outage data. Table 4 shows a num- 
ber of the outage causes derived from these 
sources, listed according to outage category. 
Some specialized terms for outage causes that 


may be unfamiliar to the reader are defined in 
Table 5. 
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Table 3. 
Outage examples classified by type of outage. 


Number Total user 

Outage Outage of users outage 

category Cause of outage and action taken minutes down minutes 

Physical Clients lost access to the application due to a broadcast storm” on the LAN 9 287 2,583 
caused by a bad supervisor card in the hub. The card was replaced to restore 
service (a spare was available on site). 

Physical Sixty workstations did not have access to the application due to a router problem. 143 60 8,580 
Operations traced the problem to a loose cable on the download server. The cable 
was tightened and the routers reloaded, and the clients rebooted their workstations. 

Design Fifteen workstations were unable to access the application. The problem was 13 15 195 
traced to a hung file server. Operations performed a stop and start of the file-server 
process to restore service to the client. 

Design Clients could not access the application. Operations stopped and restarted the 5 9 45 
name-server process to allow access. 

Design Clients could not access the application due to a hub problem. Operations reset the 90 58 5,220 
supervisory card on the 8th floor and the Ethernet cards on the 3rd and 4th floors. 

Design Clients lost access to the application. Operations traced the problem to an out-of- 24 30 720 
sync condition between the file-server and line-handler processes that occurred 
following a failure of the host application. Both processes were stopped and 
restarted to resynchronize. 

Design Clients experienced degraded service following an application maintenance 31 205 6,355 
release. The release was backed out to restore service. 

Operations After power was restored, the Ethernet card to a file server did not properly restore 360 25 9,000 
because an operator had configured it incorrectly. 

Operations Clients lost access to the application because a maintenance technician on site 5 25 125 
inadvertently placed equipment in a test mode. The technician normalized the 
equipment and users softbooted their workstations to restore service. 

Environmental Construction group accidentally set off fire-extinguishing (Halon) system in 420 25 10,500 
computer room. 

Environmental Clients lost access to the application due to a failed remote router link. The failure 486 40 19,440 
was traced to a cross-connect problem in the link caused by a power failure. The 
router was reset and service restored. 

Environmental Clients could not access the application due to a bad fuse in the building trans- 575 1,027 590,525 
former. Due to the location of the transformer, it took several hours to replace the 
fuse and restore power. 

Reconfiguration — Server network software was upgraded to include Appletalk. 120 50 6,000 


* Broadcast storm is defined in Table 5. 


Table 4. 

Examples of client/server outage causes. 

Physical Design (cont.) Operations 
pa hn 

CPU, LAN card, disk, Database corruption Cable bumped 


etc., fail accidentally 


Environmental 


Commercial power fails 


Reconfiguration 


Upgrade system 


Babbling node* Disk access error Wrong cable pulled 


Natural disaster 


Add disk 


LAN protocol error Incorrect network 


address entered 


Heating, ventilation, or 
air conditioning fails 


System move 


—————_—___ 
Design Access denied Data not backed up Circuit breaker tripped New release 
— 

PC with lock on Router algorithm conflict* Stopped wrong process Power failure recovery Bug fix 


database fails 


error 


Firmware error Packet errors (runts*, Table or log erroneously Virus Workload balancing 
jabbers*) deleted 
Self-test failure Timeouts Erroneously stopped 


application 


Operating-system crash Application freeze 


Broadcast storm* Network paging* 


Server hung 


Term is defined in Table 5. 
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Table 5. 


Definitions of specialized terms for describing outages. 


Term 


Babbling node 


Definition 


The transmission of random, meaningless packets onto the network; often 
caused by a failed LAN card. 


Broadcast storm 


A broadcast is a special message or packet that all network hosts must receive 
and process. A broadcast storm is a condition in which excessive broadcasting 
occurs, potentially disabling the entire network. Broadcast storms are usually 
due to software errors. 


Jabbers 


Packets that are larger than the maximum length allowed by the network 
protocol (for example, 1518 bytes for Ethernet). 


Network paging 


Occurs when a workstation runs a job too large for its memory and has to page 
over the network, causing very heavy network traffic. 


Router algorithm 
conflict 


Routers use some form of shortest-path algorithm. If router A thinks that the 
shortest path to router C is through router B and router B thinks that the shortest 
path to router C is through router A, packets for router C will be sent back and 
forth between routers A and B. This type of problem usually occurs because of 
a breakdown in the routers’ shortest-path updating strategy, due to software 
design errors. 


Defining a Representative Client/Server 
Environment 

In demonstrating a predictive model of avail- 
ability, it is useful to define a representative 
client/server environment. Such an environment 
is not meant to be an optimal configuration or a 
model for designing other environments. Its 
purpose is to provide a basis for illustrating the 
effects that different outages would have on 
users and to make it easier to interpret numeri- 
cal values derived from the availability model. 


Typical Features of a Client/Server Environment. 
Client/server environments are very diverse. 
They range from a small departmental LAN 
with a single server to E-mail networks con- 


Runts Packets that are smaller than the minimum length allowed by the network 
protocol (for example, 60 bytes for Ethernet). necting tens of thousands of users to hundreds 
of servers. Even business-critical applications 
Most of the outage causes in Table 4 apply to encompassing networks of ATMs have begun to 
any type of computing environment, but some adopt client/server architectures. Although it is 
are propagating outages unique to client/server difficult to define a standard client/server envi- 
computing. In a propagating outage, a hardware —_ ronment, it is possible to identify characteristic 
or software failure affects more than the users features, such as those described in the follow- 
or equipment dependent on the failed item. For ing paragraphs. 
example, a router failure should only cause In the typical client/server network, clients 
downtime for users dependent on the servers use local servers connected by a LAN. Clients 
and hubs connected to the router (see Figure ! are usually physically near their primary server, 
for an illustration of multiple routers with asso- although this is not a requirement. A communi- 
ciated hubs, servers, and users). However, if a cations server on the LAN provides wide area 
router failure causes a router algorithm conflict network (WAN) access. 
(defined in Table 5), this will result in very There is no standard client/server configu- 
heavy traffic and poor response times through- ration, but there are standard client/server 
out the LAN and cause all users to think the components. Servers are generally powerful 
LAN is down. In general, propagating outages workstations, although they can also be mini- 
can affect many users and result in very large computers or mainframes. Servers often perform 
amounts of user outage minutes. In Table 5, a specific type of service, specializing, for exam- 
broadcast storm, babbling node, network pag- ple, as file servers, database servers, and print 
ing, and router algorithm conflict are all exam- servers. Clients are most often PCs or other 
ples of propagating outages. workstations, although there are certainly other 
types of clients, such as ATM machines, hand- 
held devices, and smart phones. Typical net- 
working components include transceivers, 
bridges, routers, hubs, and gateways. 
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Figure 1 


Name server 
Security server 


apps oe | 
Comm server 


25 clients 
for Server 2 


25 clients 
for Server 1 


In a typical LAN with database servers, there 
are 10 to 50 clients for each database server. 
For present purposes, 25 clients per server is 
used as a reasonable average, based on the 
client/server environments studied and the 
judgements of experienced client/server opera- 
tions managers. Equipment in the LAN is lay- 
ered hierarchically, with subnets joined to form 
larger networks. 


A Representative Client/Server Environment. A 
representative environment for use with a pre- 
dictive model of client/server availability 
should be similar to real-world configurations 
in which availability is a primary concern. 
Since availability is of particular importance for 
a business-critical database application, the rep- 
resentative environment used in this article is a 
LAN architecture based on database servers. It 
is illustrated in Figure 1. The LAN incorporates 
the features described in the preceding section. 
Servers on the LAN are generic, non-fault-toler- 
ant servers. Most conclusions reached on the 
basis of the representative LAN apply equally 


well to other architectures, such as the typical 
campus LAN on which file and print servers 
predominate. 

In Figure 1, clients and their primary servers 
are connected to a hub. There are 25 clients for 
each primary server and two servers to a hub, 
for a total of 50 clients per hub. Four hubs are 
attached to a router, making a total of 200 
clients per router. Six routers are connected in a 
ring, such as a fiber distributed data interface 
(FDDI) ring. Five of the routers support hubs 
with servers and clients, resulting in a total of 
1,000 clients. The sixth router functions as a 
gateway for communications outside the LAN, 
and also provides a server for network manage- 
ment. Although network management activities 
such as name services and security services are 
often spread throughout a network, the design 
in Figure | is sufficient for purposes of interpret- 
ing the availability model of the next section. 


eee 
Figure 1. 

A representative 
database-server archi- 

tecture containing 1,000 

clients, 20 hubs, 6 routers, 

40 database servers, and 

1 network management 

server. 


A P RIL 199 4 « 


TANDEM SYS TEMS REV iE WwW 


29 


Table 6. 
Calculation of annual user outage minutes. 
Annual Total Total Annual 
outage number in annual Number user 
minutes c/s envi- outage of clients outage 
Outage cause per item ronment minutes affected minutes 
Hub client LAN-card failure 4 100* 400 10 4,000 
Hub server LAN-card failure 4 40 160 50 8,000 
Hub LAN-card failure causing 4 140** 140 200 28,000 
babbling-node outages 
Server-application software failure 280 40 11,200 50 560,000 


*Based on 1,000 clients and 10 clients per LAN card 
“* Sum of client and server LAN cards 


Figure 2 
Annual Average annual outage Number of such Number of users 
user outage = 4 minutes due toa specific } x 4 elements inthe } x affected by 
minutes failure type and element network each failure 


Figure 2. 


Formula for calculating 
annual user outage 


Table 7. 
Summary outage statistics for the representative 
client/server environment. 


for calculating annual user outage minutes 
resulting from a given outage cause is given in 
Figure 2. 

Table 6 shows the calculation of annual user 
outage minutes for four types of outages. In the 
table, statistics for annual outage minutes are 
derived from the outage sources cited earlier. 
Values in column 3, “Total number in c/s en- 
vironment,” are based on the representative 
client/server environment of Figure | and the 
component causing an outage. “Total annual 
outage minutes,” column 4, is the product of 
columns 2 and 3. Column 5, “Number of clients 
affected,” is the number of clients ultimately 
affected by an outage. “Annual user outage 
minutes,” column 6, is the product of columns 
4 and 5. 

The full procedure for calculating annual user 
outage minutes for a given outage cause can be 
illustrated with reference to Table 6. For LAN 
cards, vendors quote a mean time between fail- 
ures (MTBF) of 300,000 hours. To service the 
failure, user data shows a mean time to repair 
(MTTR) of three hours. Thus, on average, a LAN 


minutes Total annual user outage minutes from 12,031,800 
" all causes (1,000 clients) card causes one hour of outage for every 100,000 
Annual outage minutes per client 12,032 hours of operation. 
User availability 97.7% Given 525,600 minutes in a year, the calcula- 
Percentage of user outage time due to 68.0% tion of expected annual outage minutes for a 
unplanned outages LAN card is: 
Percentage of user outage time due to 32.0% ' 
planned outages (reconfiguration) Annual outage minutes per LAN card 
= 1/100,000 x 525,600 min/yr 
a : = 5.3 min/yr 
A Predictive Model of Client/Server haste 
Availability Thus, for an individual LAN card, there is 
; as an expected annual outage time of about five 
The client/server availability model adopted inliies 
here uses data on outage causes to calculate Based on user data, about 20 percent of LAN- 
annual Meet OMlaee nunUles: ane model wy Plee card failures are babbling-node (propagating) 
dictive. Given a specific network configuration failures.’ Thus: aporoxdmately one minute oF 
and statistics for individual outage causes, the anita L Mie doicae aa is for babbling- 
model predicts annual user outage minutes due node failures (row 3 in Table 6) and four min- 
W-QUlaces at co nienwore pa ea anc The utes are for nonpropagating failures (rows 1 
model is illustrated by applying the outage data and 2) 
from sources described earlier to the representa- On average Sec LAN Candin achubean 
tive database-server architecture. The formula support 10 clients. Since there are 1000 clients 
in the representative client/server environment, 
there are 100 client cards in hubs on the LAN. 
Total annual outage minutes (column 4 in 
Table 6) is 4 x 100, or 400 minutes. Finally, since 
10 clients are affected by each nonpropagating 
client LAN-card failure, 400 x 10, or 4,000 annual 
user Outage minutes are due to hub client LAN- 
card failures. 
30 TANDEM SYSTEMS REVIEW © APRIL 19 9 4 


Results from the Model 


Statistics in this and the following sections 
result from applying the predictive model to 
the representative client/server environment. 
Unless otherwise specified, server data is for 
non-fault-tolerant servers. Table 7 provides 
summary statistics. 

As Table 7 shows, the model predicts a total 
of 12,031,800 annual user outage minutes in the 
representative client/server environment. With 
1,000 clients, there are approximately 12,000 
annual outage minutes, or 200 hours of annual 
outage time, per client. This is equivalent to 
97.7 percent user availability. This level of user 
availability is in accord with availability figures 
from other sources (see Wood, 1994). Table 7 
shows that unplanned outages account for 
approximately 68 percent of total outage time 
and planned outages (reconfiguration) account 
for approximately 32 percent of total outage 
time. These figures are also similar to other 
availability findings. 

Figure 3 shows the proportion of total outage 
time attributable to client, server, and network 
outages, and to operations or environmental out- 
ages not specific to a single type of equipment. 
An example of a nonspecific outage is a power 
failure (environmental outage) that causes all 
equipment in a building to shut down. The most 
significant finding is that server outages account 
for almost two-thirds of all user outage minutes. 
Another significant finding, in view of the com- 
mon perception that the LAN is always to blame 
for outages, is that network outages only account 
for about 10 percent of total user outage minutes. 

Figure 4 shows the percentage of annual user 
outage minutes attributable to each outage cate- 
gory defined in Table 2. Each category is subdi- 
vided into outage percentages for server, client, 
network, and all. In the figure, “‘All” refers to 
events such as power failures or building moves 
that affect all types of equipment. 

Note that design outages account for 39 per- 
cent of user outage minutes, reconfiguration 
outages account for 32 percent, and physical 
outages account for only about 11 percent of 
all outage minutes. Thus, although users typi- 
cally perceive outages as hardware failures, 
applying the predictive model to the representa- 
tive client/server environment shows hardware 
outages to play a relatively minor role compared 


2Wood (1994) shows the derivation of annual outage minutes for each of 
the outage causes listed in Table 4. 


Figure 3 


% of total outage minutes* 


Client Server Network Nonspecific 


IB Planned outage (reconfiguration) 
Unplanned outage 


“Each interval of 10% equals 1.2 million 
outage minutes. 


Figure 4 


% of total outage minutes* 


Operations 


Physical Design 
@ All @ Client 
HE Network (J Server 


“Each interval of 10% equals 1.2 million outage minutes. 


Environment 


Figure 3. 

Proportion of total outage 
time attributable to differ- 
ent types of equipment 
outages. 


Reconfiguration 


to other outage categories. This is the case even 
though the data does not include fault-tolerant 
servers or other fault-tolerant components. 
Reducing outage times by adding fault-tolerant 
components is described in Part 2. 


a 
Figure 4. 

User outage minutes by 

outage category. 
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Part 2: Improving Availability ina 
Client/Server Environment 


Client/server availability can be improved by 


m Decreasing the frequency of outages (increas- 
ing the time between outages). 


a Decreasing the duration of outages (decreas- 
ing detection, repair, and recovery times). 


m Decreasing the number of users affected by 
an outage. 


It is not possible to simply “throw hardware” 
at the problem. For example, if the number 
of client LAN cards is doubled without adding 
fault tolerance, so that the number of users 
affected by a LAN card failure is cut in half, the 
expected number of LAN card failures doubles, 
because there are now twice as many cards. 
Therefore, the net effect of doubling the LAN 
cards is zero. There are, however, a number of 
other approaches to improving availability, as 
described in the following sections. 

Improved availability provides a savings in 
user downtime, but may also involve a certain 
cost, such as the expense of providing an unin- 
terruptible power supply. The predictive model 
and representative client/server environment 
can show how much a specific enhancement or 
new feature is likely to reduce user outage time. 
The overall cost savings from an outage reduc- 
tion will vary from site to site. In evaluating 
methods for improving availability, sites need 
to carry out their own analyses of costs and 
benefits. 


Operational Improvements 


Operations errors such as improper installation 
of hardware or software, incorrectly entering 
network addresses, and accidentally or incor- 
rectly halting system or user processes account 
for just over 14 percent of user outage minutes. 
Although users other than operations personnel 
frequently contribute to such outages, improv- 
ing the effectiveness of operations staff would 
significantly reduce outage times. 

As a useful start toward improving availabil- 
ity, whenever there is an outage, operations can 
attempt to identify its cause, time its duration, 
calculate user outage minutes, and record the 
information in a log. The logged information 
can then be used in setting priorities and plan- 
ning ways of minimizing outage times. 


Improved Training and Tools. In the course of 
researching client/server availability, there were 
many discussions with operations personnel. 
Operations staff members felt that with improved 
training and tools they could more quickly detect 
problems and their causes, make repairs, and 
carry out recovery procedures. Since there is no 
data on the subject, it is difficult to predict the 
number of user outage minutes that might be 
saved through improved training or tools. 
However, if better training and tools could 
reduce the average duration of outages by 10 
percent, this would be equivalent to a 10 per- 
cent reduction in total outage times, a savings 
of over 1,200,000 annual user outage minutes. 

An interesting and important finding from 
the model is that propagating failures account 
for over a third of user outage minutes. If an 
operations staff could be trained in ways to find 
and localize the effects of a fault, it might be 
able to significantly reduce user outage times 
due to propagating failures. For example, if 
operations personnel can quickly determine the 
source of a babbling node or broadcast storm, 
they can temporarily remove it from the net- 
work and prevent the outage from affecting 
large numbers of other nodes.+ 


*Tools that allow an operator to view network performance significantly 
improve the operator’s ability to troubleshoot network problems. An 
example of such a tool is the Ungermann-Bass® NetDirector® product. 
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Testing to Ensure Successful Restarts After an 
Outage. Equipment recovery problems follow- 
ing an outage often increase the outage time. 
Approximately 2.5 percent (300,000 minutes) 
of user outage time in the representative 
client/server environment is directly attributable 
to equipment that does not reset properly follow- 
ing a power outage. Advance power-cycle test- 
ing of equipment to make sure it will restart 
after an outage can help to reduce outage dura- 
tions. Restarting after recovery from other kinds 
of outages can also be a problem. One way to 
reduce such outages is to test systems in advance 
using simulated failure conditions. Often this 
can be done by testing a few components offline 
without affecting normal operations. 


Improved Cable Layout and Problem Tracing. 
Cabling is a frequent source of problems in a 
client/server environment. About | percent of 
user Outage minutes in the representative LAN 
are directly attributable to cable problems caused 
by operations personnel, such as accidently dis- 
connecting power cords. However, other prob- 
lems may be created by the difficulty of tracing 
cables through the spaghetti that often exists in 
LAN closets. In many cases, improving cable 
layout would make it possible for operators to 
trace problems more quickly and thus reduce 
recovery times. Improving cable layout may 
require expenditures for added hubs or other 
hardware and for additional personnel to trace 
and mark cables and possibly redo the existing 
cable layout. 


Adding Fault-Tolerant Components 


Adding fault-tolerant components to the repre- 
sentative LAN could reduce user outage times 
by as much as 70 percent. Maximum fault toler- 
ance would require fault-tolerant servers, fault- 
tolerant clients, fault-tolerant client and server 
LAN connections, and a fault-tolerant LAN. 


Fault-Tolerant Servers. The model showed non- 
fault-tolerant servers to be the primary source 
of client/server outages, accounting for almost 
two-thirds of lost user time in the representative 
client/server environment. Fault-tolerant servers 


can provide a major reduction in user outage 
times. Outage data for Tandem™ fault-tolerant 
servers has been gathered since 1992. Based on 
this data, if all servers in the representative 
client/server environment were fault-tolerant, 
server Outage times would be reduced by a fac- 
tor of six and total annual user outage times 
would be cut in half, dropping from over 12 
million minutes to under 5.5 million minutes. 

The improvement in user outage times pro- 
vided by fault-tolerant servers can be under- 
stood by referring to Figure 4, which shows the 
proportion of server outages in each of the basic 
outage categories. Fault-tolerant servers almost 
entirely eliminate physical server outages 
because of the redundancy they provide. Their 
software fault tolerance significantly reduces 
server design outages, since most design bugs 
are transient and cause a failure in only a single 
process or processor. Operations outages are 
reduced because stopping a single process or 
processor, whether by mistake or for purposes 
of testing, does not cause a system outage. In 
addition, the ability to reconfigure Tandem 
servers online can considerably reduce recon- 
figuration outage times. 


Fault-Tolerant LAN Connections. Fault-tolerant 
server-to-LAN and client-to-LAN connections 
can reduce user outage times by approximately 
3 percent, or 362,000 minutes in the representa- 
tive LAN. 

A fault-tolerant server-to-LAN connection 
effectively eliminates user outage minutes due 
to LAN card failures in the server and in the hub 
or other server-to-network connection point. In 
the representative client/server environment, 
fault-tolerant server-to-LAN connections would 
result in a savings of 90,000 user outage minutes 
per year. 
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Figure 5. 


Types of client/server LAN 


Figure 5 


(a) LAN connections 


(b) Fault-tolerant LAN connections 


(c) Fault-tolerant LAN with 


connections. without fault tolerance and fault-tolerant server fault-tolerant server and client 
Fault-tolerant server Fault-tolerant server 
Fault-tolerant 
server-LAN 
connection 
Fault- 
tolerant 
LAN 
Fault-tolerant Fault- 
client-LAN tolerant 
connection client 
Client 
Sr th 
In a fault-tolerant client-to-LAN connection, client workstations. In the representative 
dual-ported or redundant LAN cards in clients client/server environment, fault-tolerant client- 
can take advantage of redundant paths through to-LAN connections would result in a savings 
the network. This eliminates outage minutes of 272,000 user outage minutes per year.‘ 
due to LAN-card failures in the client and in the — Figure 5a shows a client/server configuration 
server-to-client connection. Fault-tolerant with no fault-tolerant components. Figure 5b 
client-to-LAN connections can be an inexpen- shows a configuration with fault-tolerant 
sive alternative to using fault-tolerant PCs or servers and fault-tolerant LAN connections. 
4Recently announced Tandem and Ungermann-Bass NonStop™ Access for 
Networking products provide the hardware and software necessary for 
fault-tolerant server-to-LAN and client-to-LAN connections. 
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Fault-Tolerant LAN. The most effective way to 
reduce network outages is to make a LAN fault- 
tolerant. This will almost completely eliminate 
user outage times due to the failure of network 
equipment and network reconfigurations. In 
addition, it can greatly reduce outage times 
resulting from propagating failures such as 
broadcast storms and babbling nodes. Imple- 
menting a fault-tolerant LAN requires that there 
be (1) multiple, independent paths between all 
clients and servers and (2) software that can 
automatically detect the loss of a server-to- 
client connection and switch to an alternative 
path without user intervention.5 If the represen- 
tative LAN were fault tolerant, it would save 
1,197,000 user outage minutes, almost 10 per- 
cent of total user outage minutes throughout 
the LAN. Figure 5c shows a fault-tolerant LAN 
configuration. 


Fault-Tolerant Clients. Fault-tolerant clients can 
protect against most client hardware failures, 
including disk and power failures. Adding 
fault-tolerant clients to the representative 
client/server environment would save 388,000 
user Outage minutes.® 

Currently, fault-tolerant clients are not de- 
signed to protect against software failures or 
to allow online reconfiguration. Because fault- 
tolerant clients are expensive, a reasonable alter- 
native is to have extra clients available in a hot 
standby mode. 


Reestablishing Interrupted Client Sessions 
The most common client failure mode is a tran- 
sient error in the client, network, or server that 
forces the user to reboot or restart an applica- 
tion. By itself, this generally takes at most a few 
minutes, but the user then has to reestablish the 
session and ascertain the status of any outstand- 
ing transactions. If the client lacks information 
about the interrupted session, reestablishing the 
session can take several minutes. Once the ses- 
sion is reestablished, the client must determine 
the status of transactions that were in process 
when the session was disrupted. 


‘Recently announced Tandem and Ungermann-Bass NonStop Access for 
Networking products include the hardware and software necessary for a 
fault-tolerant LAN. 


*Here and in Table 8. data for fault-tolerant clients is based on fault- 
tolerant PCs. 


If both client and server maintain a transac- 
tion log and other information necessary for 
reestablishing a client session, downtime can be 
reduced significantly. Once the client reboots 
and logs in, the server can reestablish the ses- 
sion, provide the status of client transactions, 
and resume processing for the client. Such mea- 
sures can reduce user outage times by 2.4 per- 
cent, or 292,000 minutes in the representative 
client/server environment. 


Uninterruptable Power Supplies 
Environmental outages are dominated by com- 
mercial power failures. The simplest way to 
protect against such outages is to provide unin- 
terruptable power supplies (UPSs) and to improve 
site power filtering and conditioning where pos- 
sible. Introducing UPSs into the representative 
client/server environment could reduce user 
outages by 389,000 minutes, a 3 percent reduc- 
tion in total user outage minutes. A problem 
introduced by client/server computing is that 
site consolidation makes it easier to protect 
against power outages, but the nature of 
client/server environments tends toward site 
expansion. 


Online Reconfiguration for System Moves 


The model predicts about 777,000 user outage 
minutes for the representative client/server 
environment due to massive reconfigurations 
such as system moves. A fault-tolerant server, 
network, and client architecture can help reduce 
this downtime if it can be configured as two 
independent client/server systems. Conceptual- 
ly, this allows half the client/server hardware to 
be moved to a new site while the other half con- 
tinues to run the application. The hardware at the 
new site can then be configured to run required 
applications while the remaining equipment is 
moved. This could provide a 7 percent reduction 
in total user outage minutes. 
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Table 8. 
Methods for reducing user outage minutes. 


Minutes Percent of 
saved total outage 
Feature (in 1000s)* _— time** Comments 
Fault-tolerant Tandem servers 6,542 54.4% Based on Tandem data 
Fault-tolerant server-LAN connections 90 0.7% Reduces server-LAN card outages by 90%, babbling node by 
50%. Reduces hub-card and router-card failures by 15%. 
Fault-tolerant client-LAN connections 272 2.3% Reduces client and network LAN-card failures by 50%, and 
babbling node outages by 50%. 
Fault-tolerant LAN 1,197 9.9% Reduces most network failure modes by 90-99%, router 
algorithm conflicts by 75%, broadcast errors by 50%, opera- 
tions cable accidents by 90%, operations testing by 50%. 
Fault-tolerant clients (PCs) 388 3.2% Reduces client (PC) hardware outages by 95%. 
Reestablish interrupted client session 292 2.4% Reduces client OS and application outage times by 75% 
through faster recoveries. 
Uninterruptable power supplies 389 3.2% Reduces power outages by 90%. 
Online reconfiguration for system moves 777 6.5% Reduces user outage time due to system moves by 90%. 


* Number of user outage minutes saved by the feature, based on a total of 12,032,000 user outage minutes for the representative LAN. 


** Reduction in total user outage minutes, expressed as a percentage. 


Summary of Methods for Improving 
Availability 

With the exception of operational improvements, 
Table 8 lists each of the methods for improving 
client/server availability and its potential effect 
on user outage minutes. In each case, the sav- 
ings in user outage time provided by a feature 
should be weighed against the possible cost of 
implementing the feature. Operational improve- 
ments are not included in the table because they 
are not associated with specific reductions in 
user outage times. 


Conclusion 


The most useful way to evaluate client/server 
availability is in terms of user outage times. 
Using outage data from a number of indepen- 
dent sources, a predictive model of client/server 
availability generated statistics on annual user 
outage minutes for a representative client/server 
environment. Outages were categorized as 
physical, design, operations, environmental, or 
reconfiguration. Results were presented by 
equipment type and outage category. 

The major finding concerning equipment 
type was that non-fault-tolerant servers accounted 
for almost two-thirds of all user outage minutes 
in the representative client/server environment. 
Based on data for Tandem servers, it was found 
that fault-tolerant servers could reduce server 
outages by a factor of six and cut total user out- 
age times in half. The findings for outage cate- 
gories showed that design outages and planned 
(reconfiguration) outages accounted for the great 
majority of user outage minutes. Design outages 
were responsible for 39 percent of user outage 
minutes, reconfiguration outages were respon- 
sible for 32 percent of user outage minutes. 
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A number of approaches to increasing 
client/server availability were presented. These 
included operational improvements, the use of 
fault-tolerant servers and fault-tolerant LAN 
connections, uninterruptable power supplies, 
and the reestablishment of client sessions fol- 
lowing a temporary break. 
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F EA T U RE 


Automating Call Centers With CAM 


ee 
any call centers are 
turning to automation 
to improve productiv- 
ity, profitability, and 
customer service. (A 
call center is any busi- 
ness center that handles 
the majority of its work over the phone. A good 
example is an airline reservation center.) 

To automate a call center, however, one must 
have intimate knowledge of the telephone switch 
and the protocol it uses to communicate with the 
computer. Making the transition from the tradi- 
tional call center to an automated call center can 
be costly and time-consuming. 

The Tandem™ Call Applications Manager 
(CAM) software provides computer-telephone 
integration (CTI), linking applications on 
Tandem NonStop™ systems with telephone 
switches. CAM reduces the time it takes to inte- 
grate existing call center applications, or write 
new ones, through its simplified application pro- 
gram interface (API). This switch-independent 
API command set allows users to tailor their 
application’s architecture to meet their specific 
needs. Users can focus on the application instead 
of worrying about CTI protocol and how to 
manage it. 


This article discusses the role CAM plays in 
an automated call center. It includes two sce- 
narios that describe how CAM interacts with 
an application to handle incoming calls. It then 
explains the basic call center functions provided 
by CAM and the features and benefits CAM 
offers to programmers developing call center 
applications. Finally, the article explores the 
architectural choices developers can make when 
building call center applications with CAM. 


Benefits of Using CAM 


In a traditional call center, the telephone agent 
receives a call and then has to use the terminal 
keyboard to retrieve information related to the 
customer’s call. This can take many seconds. 
CAM can automatically retrieve a caller’s phone 
number from the telephone switch and pass it to 
the application. By using CAM, a call center 
application can deliver information about a cus- 
tomer to an agent’s terminal screen at the same 
instant the agent receives the customer’s phone 
call. If a customer requires special information 
or service, CAM can route the call, along with 
the accompanying account information, to a 
second agent. 

CAM also supports integrated voice response 
(IVR) applications, which use recorded or auto- 
mated voice response units (VRUs) to offer the 
caller a menu of services. An IVR application 
can use CAM to route calls from the recorded 
voice menu to a live agent. Moreover, an appli- 
cation using CAM can automatically make out- 
bound calls for agents, which helps agents work 
productively when they are not receiving 
incoming calls. 
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Scenario 1: A Customer Requests Two Services From a Call Center 


In this scenario, a bank’s automated call center pro- 
vides two different services for a customer. The cus- 
tomer owns a small business and uses this bank for 
all his business needs. 


1. The customer phones the bank’s call center from 
his business and gets connected to the first avail- 
able agent in the general customer representative 
group (Agent A). The switch also notifies CAM 
of the call just sent to Agent A. This message con- 
tains, among other information, the customer’s 
phone number, provided by automatic number 
identification (ANI). 

2. CAM passes this information to the application, 
which uses the ANI to find the customer’s account 
information and display it on Agent A’s terminal 
screen at the same time that Agent A answers the 
phone. 


3. The customer asks Agent A some questions about 
deposits he made last week. Agent A can help him 
right away because the customer’s account infor- 
mation is already on the terminal screen. 

4. The customer then asks about a new business loan 
advertised by the bank. Loans being a specialty 
item, Agent A uses the terminal keyboard to trans- 
fer the customer to the loan department. 

5. The application sends a transfer-request message 
to CAM, together with a data buffer that contains 
the customer’s account information and his request 
for information about the business loan. 


6. CAM saves the transfer buffer and sends the trans- 
fer request to the switch. 

7. The switch sends the call to an agent in the loan 
department (Agent B) and also notifies CAM of the 


call transfer. 


8. CAM determines that the call offered to Agent B 
is a transfer call from Agent A. CAM attaches the 
original data buffer and passes it to the application 
used by Agent B. 


9. The application displays the customer’s account 
information and a history of the previous transac- 
tions on Agent B’s terminal screen. Agent B can 
now speak to the customer knowing who he is, 
what has just transpired, and that he wants infor- 
mation about the new business loan. 


a) 
Sad 
Customer 


Agent A Agent B 
(Customer rep group) (Loan group) 


4 


Thus, CAM makes it possible for a call cen- 
ter to provide enhanced services to customers. 
CAM also helps reduce the cost of maintaining 
a call center. The two scenarios on the follow- 
ing pages show how an automated call center 
using CAM produces both tangible savings and 
other benefits. 

Many call centers pay for incoming calls by 
offering 800-number services. By displaying 
customer information automatically, the call 
center application can save the agent many 
valuable seconds. With the account information 
already on the screen, the agent can dispense 
with basic questions and serve the customer 
promptly and efficiently. 


More time is saved when a call is transferred 
to another agent. The second agent immediately 
knows what conversation took place with the 
previous agent and what the customer wants. 
These time savings translate into dollar savings 
in the phone bill. Moreover, fewer agents can 
handle more phone calls, and the agents can 
provide more personable service when they 
have immediate access to customer information. 
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Scenario 2: An Automated Call Center Using a Voice Response Unit 


In this scenario, a bank’s automated call center uses 

a voice response unit (VRU) to take incoming calls. 

The customer calls in and has a dialog with a VRU. 

The VRU collects information from the customer and 

transfers the call to a live agent, who offers further 

help. 

1, The customer phones the call center and gets con- 
nected to the VRU. The switch notifies CAM that 
a call has been sent to the VRU. CAM then passes 
the information to the application. 

2. The customer and the VRU have a dialog. The cus- 
tomer obtains her account balance from the VRU 
after entering her account number and personal 
identification number (PIN). 

After completing this transaction, the customer 
asks to speak to a customer representative. She 
wants information about a home equity loan. 


3. The application controlling the VRU sends a request 
to CAM to transfer the call to the loan department. 
The data buffer attached to the transfer request con- 
tains information about the customer’s account and 
the transaction that just took place. 

4. As in Scenario 1, CAM sends the transfer request 
to the switch, which transfers the call to Agent B 
in the loan department and notifies CAM. CAM 
sends the accompanying information to the appli- 
cation used by Agent B, which displays it on 
Agent B’s terminal screen. 


Automated Call Centers Using CAM 


An automated call center integrates the tele- 
phone (switch) and the terminal screen (host 
computer) through the CTI link. Automation 
allows the agent to manipulate telephone func- 
tions from the terminal keyboard as well as 
receive detailed call information passed from 
the switch. 

CAM is an application software product that 
provides the link between the telephone switch 
receiving calls and the application that manipu- 
lates customer information related to those calls. 
In particular, CAM performs the following 
functions: 


Customer 


Agent A’s 
phone and terminal 


By the time the call is transferred, Agent B knows 
who the customer is and is ready to discuss the home 
equity loan with her. 


w Maintains the communication link to the 
switch. 


= Facilitates the transfer of data or sessions 
screens to other agents. 


= Routes messages to the appropriate applica- 
tion processes. 


m Provides access to switch-specific features 
such as logging an agent in and out and making 
an agent’s phone available or unavailable. 

= Handles the application protocols of different 
switches. 

m Converts messages from different switches to 
a common format. 

= Supports interfaces to the Tandem NonStop 
Kernel operating system and the Tandem 
Pathway transaction processing environment. 
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The CTI protocol allows the switch to pass 
call information such as the calling and dialed 
numbers to CAM. CAM selects the appropriate 
application program for handling the call and 
passes on the information. The application 
obtains additional data required to handle the 
call and sends it to the agent’s terminal screen. 
The application can use Pathway software to 
access, format, and deliver data to an agent’s 
terminal screen. One can also use a client appli- 
cation on an agent’s workstation to deliver 
screen data. Figure | shows a diagram of an 
automated call center using CAM. 


Using Phone Numbers to Customize Call 
Processing 

When a switch connects an incoming call, the 
call center application uses the information 
received from CAM to customize call process- 
ing. If the telephone network provides automatic 
number identification (ANI) or calling line iden- 
tification (CLID), CAM relays the calling num- 
ber to the application, so caller-specific data 
can be retrieved from the application database 
and custom scripts delivered to the agent. 

CAM can also obtain the number dialed by 
the caller through the Dialed Number Informa- 
tion Service (DNIS), provided by the telephone 
network. CAM passes this number to the appli- 
cation, which can use it to identify the nature of 
the call and handle the call accordingly. Using 
the DNIS, the application not only delivers the 
call to the appropriate agent group, but also dis- 
plays an appropriate menu on the agent’s termi- 
nal screen. 

The application can also use DNIS informa- 
tion to allow a single agent to handle multiple 
functions independently and efficiently. The 
application can deliver different functional 
screens to the agent’s terminal, based on the 
DNIS number. As a result, the agent instantly 
knows which service the customer wants and 
greets the customer appropriately. 

To help monitor call center activity, CAM 
identifies and tracks all calls. CAM informs the 
relevant application when each call is connected, 
transferred, disconnected, or abandoned. In addi- 
tion, telemarketing functions can be enhanced 
with the make-call feature that can be invoked 
from the terminal keyboard. The host-enhanced 


Figure 1 


Automated call center 


call routing feature also allows the application 
to dynamically contro] to which agent the call 
should be offered. 


IVR Applications 


IVR applications can use CAM to perform all 
the same functions as applications that support 
live agents. CAM notifies IVR applications of 
call status events and allows the applications to 
control switch functions. Synchronized deliv- 
ery of voice calls and call data over the CTI 
link offers IVR applications enhanced flexibil- 
ity in call control. For example, an application 
can customize call handling for different callers 
or dialed numbers, allowing the VRU to respond 
to a call in a specific language, with a special 
script, or using customized processing logic. 


Pe eee a 


Figure 1. 


Voice response 
unit (VRU) 


An automated call center 


using CAM. 
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Coordinated Voice and VRU Data Transfer 


The application must coordinate voice and data 
transfer when a call is screened by a VRU before 
being transferred to an agent, or when a live 
agent must escalate a call to a supervisor or 
transfer it to another agent in a different depart- 
ment. For customer satisfaction and agent effi- 
ciency, a voice-VRU coordinated application 
should provide to the receiving agent the data 
already obtained from the caller (either by a 
VRU or by an agent). 

When a call is transferred from a VRU to an 
agent, CAM coordinates with the IVR applica- 
tion and the switch to ensure that all data is 
routed to the application controlling the agent’s 
terminal. If the call is transferred to a second or 
third location, CAM ensures that all collected 
data accompanies the call by directing it to the 
appropriate application delivering data to the 
receiving agents. Thus, the agent answering a 
transferred call has the necessary background 
information to go to work immediately. Agents 
don’t waste time collecting the same informa- 
tion again, and caller problems or requests are 
resolved faster. 


Host-Enhanced Routing 

With CAM, a call center application on the 
Tandem system can help select the appropriate 
destination for incoming calls. CAM then passes 
routing commands to the switch. 

Before an incoming call is connected by the 
switch, CAM provides information about the 
call to the application, which can determine 
where to route the call. This determination can 
be based on the characteristics of available 
agents and information provided with the call. 
As well as making call routing more efficient, 
this feature allows the application to customize 
routing for specific callers such as those with 
special language needs or those requiring prior- 
ity handling. 


Automated Outbound Dialing 

Through CAM, a call center application can 
make outbound phone calls automatically. By 
using CAM, the application can ask the switch 


to place a call to an internal or external loca- 
tion. CAM gives the application access to other 
outbound functions such as speed dialing or 
preview dialing that enhance agent productivity 
and eliminate dialing errors. In addition, the 
application can track successful calls and save 
abandoned or incomplete calls so that agents 
can automatically call back the unreached cus- 
tomers during less busy hours. Agents can 
make calls by selecting items from a menu; 
they don’t have to dial the numbers manually. 


Automated Control of Agent Status 


CAM offers agents a single point of access to 
automated call center functions. Agents can log 
into or out of the switch through the terminal 
keyboard. Without using the telephone set, they 
can automatically make themselves unavailable 
for call connection after they terminate a call. 
This simplifies agent activity and reduces errors, 
while providing time for activities such as wrap- 
ping up a call. Agents can thus be more effec- 
tive; there is no risk of getting an incoming call 
during call wrap-up. Moreover, agents can 
choose to be made available automatically 
when the application determines that they are 
ready to take a new call. 


Application Development Features 


CAM offers several benefits for call center appli- 
cation developers. By maintaining the CTI link 
and handling session-management functions 
(such as logging into the switch to request the 
use of CTI services and registering agents), 
CAM simplifies the application and allows 
developers to concentrate on the call handling. 
CAM translates call information received over 
the CTI link into a simpler and more uniform 
format, presenting the formatted information to 
the application. Developers can also choose to 
request the call-related information in its raw 
format to get any information they wish. 

In addition to supporting ease of develop- 
ment, the CAM API supports switch indepen- 
dence. By allowing a single application to 
communicate with multiple telephone switches, 
CAM can increase the flexibility of the applica- 
tion and further reduce development costs. 
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Application Design 

Applications using CAM can be written as mul- 
tiple programs. Developers can write each 
application function as a separate program that 
can tell CAM to deliver the appropriate mes- 
sages to it. For example, Process A can handle 
enhanced routing functions, while Process B 
handles status events. One can even separate 
the general call handling by agent groups that 
handle different functions. Thus, a separate 
process would manage each agent group. (See 
Figure 2.) 

CAM determines where to deliver specific 
messages by means of entries in the Directory 
Number (DN) file. (Developers must provide 
these entries.) CAM uses the DN file to deter- 
mine where to send route-request messages, 
status events, and call information for every 
agent. Developers can choose whether to use 
a single program or multiple programs in their 
application design. 


Message-Based Interface 


To promote ease of development, the CAM API 
is a message-based interface with fixed-length 
messages and common fields in the same loca- 
tion for all messages. For example, every mes- 
sage begins with the message type, message 
length, and return-code fields. All switch- 
specific parameters are located at the end of 
the message. The API uses fixed-length mes- 
sages to make writing Pathway applications 
simpler and more efficient. By having common 
fixed-length messages, the Pathway application 
can allocate a single buffer for its receive buffer 
instead of multiple buffers for the different- 
sized messages. 

The API provides an optional user header 
and CAM identifier to reduce the need to change 
an existing application. The user header gives 
developers some flexibility in integrating CAM 
into an existing application. The user header 
allows the application to identify the CAM mes- 
sage type by using the preexisting methods, 
which look for the message type in a particular 
offset from the beginning of the message. Simi- 
larly, the optional CAM identifier gives devel- 
opers an easy way to distinguish between CAM 
messages and messages from another program. 


Call routing 


Table 1. 

Switches supported by CAM. 

Manufacturer Switch CTI link 

Aspect CallCenter ACD _— ApplicationBridge 
AT&T Definity G3 CallVisor ASAI 
Ericsson MD110 ApplicationLink 
Northern Telecom Meridian 1 Meridian Link 
Northern Telecom DMS100 Compucall 
Northern Telecom SL100 $L100 Meridian SCAI 
Rolm 9751 CallBridge CSTA 
Siemens Hicom 300 CallBridge ACL 


Switch Independence 

The CAM API is a switch-independent command 
set. CAM currently supports seven different 
switches, each of which behaves differently in 
many ways. CAM hides most of these differ- 
ences from the application, which provides sev- 
eral advantages for writing an application. Some 
call centers have multiple telephone switches, 
which may come from different manufacturers. 
With the common interface provided by CAM, 
users can change to a different switch without 
having to change the application. Table 1 lists 
the switches supported by CAM. 


ae 
Figure 2. 

A call center application 
consisting of multiple 

programs. 
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Figure 3. 
Three uses of the CAM/DF 
feature. a) CAM/DF used 
as a switch simulator. 
b) CAM/DF used as an 
application simulator. 
¢) CAM/DF supplementing 
some functions of the 
application. 


Figure 3 


(a) 


Call center 
application 


CAM/DF 
CD simulato 


CAM/DF 


Call center 
application 


CAM/DF 


The Tandem Alliance partners that write call 
center applications can benefit especially from 
the switch-independent command set, since 
they have to write applications that communicate 
with all the switches. Using CAM will reduce 
costs for the Alliance partners; they can pass 
on some of these savings to Tandem users in 
the form of simpler and quicker application 
development. 


Development Tools 


The CAM product comes with a development 
tool kit, the CAM Development Facility 
(CAM/DE), which helps speed up application 
development time. CAM/DF contains a switch 
simulator and an application simulator. As a 
switch simulator, CAM/DF gives programmers 
the flexibility of developing the application 
without having a switch. The switch simulator 
reduces the complexities that arise from deal- 
ing with the details of the real switch and iso- 
lates the effort of integrating CAM with the 
application. 


Conversely, one can use the CAM/DF appli- 
cation simulator to integrate CAM with the 
switch before introducing the application. The 
application simulator can coexist with the call 
center application during its development 
phase. The application simulator can monitor 
and handle some functions, while the call center 
application performs other functions. As pro- 
grammers continue to develop the application, 
they can gradually eliminate the dependence on 
the CAM/DF. Figure 3 illustrates the uses of the 
CAM/DF. 


Call Center Architecture 


With CAM, one can design an automated call 
center in many ways. Some call centers have a 
VRU that can take incoming calls, collect infor- 
mation such as account numbers, and provide 
for the transfer of calls to live agents. The appli- 
cation program logic can automatically transfer 
a call from the VRU to an agent; the application 
tells CAM to transfer the call. Furthermore, the 
application can pass any collected data to the 
destination agent by attaching the data to the 
transfer-request message, which the application 
sends to CAM to initiate the transfer. 

Another option is to use a switch that has a 
built-in, integrated VRU instead of using a VRU 
provided by a third party. This architecture has 
the advantage of using a single communication 
interface to the switch. Architectures that have 
separate VRUs need two communication inter- 
faces, one for the CTI link and a separate one 
for the VRU. Having a single interface can sim- 
plify the design of the application. 
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Regardless of the architecture and the switch 
type one chooses, one needs to make only mini- 
mal changes to an existing application, since the 
CAM API is virtually the same for all switches. 
The only code one must change is the process- 
ing of the switch-specific data fields in certain 
messages. 

Furthermore, using the CAM routing feature, 
one can move into the application many call- 
routing functions traditionally made by the 
switch. The advantage is that the application 
can dynamically contro] call routing. 

Assume, for example, that a home-shopping 
call center has a promotional sale item and cus- 
tomers are asked to call a certain phone number 
to order the item. One can configure the appli- 
cation to route all the phone calls to this num- 
ber to a particular group of agents who have 
been alerted to the sale. When another sales 
item is advertised with another phone number, 
one can configure the application to send those 
calls to another group of agents. The applica- 
tion can easily handle these dynamic changes in 
the call center. However, if the switch were to 
handle the call routing rather than the applica- 
tion, the routing changes would require more 
time to configure, and there would be downtime 
for the phones affected by these changes. 


Conclusion 


By providing a simple, uniform API command 
set that handles computer-telephone integration 
(CTI), CAM can speed up the development of 
call center applications, thereby reducing the 
cost of automating a call center. With CAM, 
users can choose a variety of call center archi- 
tectures that incorporate features such as VRUs, 
call routing dynamically controlled by the appli- 
cation, and the simultaneous delivery (or trans- 
fer) of customer information with the customer’s 
phone call. CAM also provides switch indepen- 
dence, giving the application flexibility when it 
comes to choosing, adding, or changing tele- 
phone switches. 


Benefits of Automating 
a Call Center With 
Tandem Computers 


Telephone switches generally 
have been built to be reliable, 
with minimal down time. How- 
ever, it does no good to have 

the telephones working if agents 
can’t access the information in 
the computer because of a com- 
puter failure. Tandem’s NonStop 
computers and software environ- 
ment complement the reliability 
of the telephone switches by pro- 
viding continuous availability and 
data integrity. Thus, Tandem sys- 
tems are ideally suited for call 
center applications. 


Third-Party Solutions 


Tandem has partnerships, through its 
Alliance programs, with several compa- 
nies that write call center application 
software. Different Alliance partners 
focus on addressing different call center 
needs. For example, some partners spe- 
cialize in interacting with a VRU. Other 
partners specialize in telephone deliv- 
ery system software, which provides 
solutions for customer service, auto- 
mated sales support, and account man- 
agement. By forming these partnerships, 
Tandem can provide a total business 
solution to a call center’s requirements. 

For information about Tandem part- 
ners who provide call center solutions, 
refer to the Tandem Alliance 1994 
Solutions & Services Directory (1993), 
or its most current version. 


Thus, CAM can make it possible for a call 


center application to provide enhanced services 
to customers. In addition, CAM reduces the time 
it takes to handle each call, which allows fewer 
agents to serve the same number of customers 
and thus substantially lowers the cost of run- 
ning a call center. 
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| TandemSystemsReview/ndex 


The Tandem Journal became the Tandem Systems Review in February 1985. Four issues of the 
Tandem Journal were published: 


Volume 1, No.1 Fall 1983 
Volume 2, No.1 Winter 1984 


Volume 2, No.2 Spring 1984 
Volume 2, No.3 Summer 1984 


As of this issue, 25 issues of the Tandem Systems Review have been published: 


Volume 1, No.1. Feb. 1985 Volume 5,No.1 April 1989 Volume 9, No.2. Spring 1993 
Volume 1, No.2 June 1985 Volume 5, No.2 Sept. 1989 Volume 9, No.3 Summer 1993 
Volume 2, No.1 Feb. 1986 Volume 6, No.1. March 1990 Volume 9, No.4 Fall 1993 
Volume 2, No.2 June 1986 Volume 6, No.2 Oct. 1990 Volume 10, No.1 Jan. 1994 
Volume 2, No.3 Dec. 1986 Volume 7, No.1 April 1991 Volume 10, No.2 April 1994 
Volume 3, No. | March 1987 Volume 7, No.2 Oct. 1991 

Volume 3, No.2. Aug. 1987 Volume 8, No. | Spring 1992 

Volume 4, No. 1 Feb. 1988 Volume 8, No.2 Summer 1992 

Volume 4, No.2 July 1988 Volume 8, No.3 Fall 1992 

Volume 4, No. 3. Oct. 1988 Volume 9, No.1 Winter 1993 


The articles published in all 29 issues are arranged by subject below. (Tandem Journal is abbreviated as TJ and 


Tandem Systems Review as TSR.) A second index, arranged by product, is also provided. 


Index by Subject 
Volume, Publication Part 
Article title Author(s) Publication —_ Issue date number 
APPLICATION DEVELOPMENT AND LANGUAGES 
A New Design for the PATHWAY TCP R. Wong TJ 2,2 Spring 1984 83932 
An Overview of Client/Server Computing on Tandem Systems H. Cooperstein TSR 8,3 Fall 1992 89803 
An Introduction to Tandem EXTENDED BASIC J. Meyerson TJ 2,2 Spring 1984 83932 
Application Code Conversion for D-Series Systems K. Liu TSR 9,2 Spring 1993 89805 
Application Profile: Storing Macintosh Graphics on the D. Broyles TSR 9,3 Summer 1993 89806 
Tandem 5200 Optical Storage Facility 
Automating Call Centers With CAM W. Choi TSR 10,2 April 1994 104398 
Basic Uses and New Features of Extended GDS A. Hotea TSR 10,1 Jan. 1994 104396 
Debugging TACL Code L. Palmer TSR 4,2 July 1988 13693 
Designing and Implementing a Graphical User Interface S. Wolfe TSR 9,3 Summer 1993 89806 
Designing Client/Server Applications for OLTP on W. Culman TSR 8,3 Fall 1992 89803 
Guardian 90 Systems 
Extending the Client/Server Model With Object-Oriented Technology __T. Rohner TSR 10,1 Jan. 1994 104396 
Implementing Client/Server Using RSC M. lem, T. Kocher TSR 8,3 Fall 1992 89803 
Instrumenting Applications for Effective Event Management J, Dagenais TSR 7,2 Oct. 1991 65248 
New TAL Features C. Lu, J. Murayama TSR 2,2 June 1986 83837 
PATHFINDER-An Aid for Application Development S. Benett TJ Vid Fall 1983 83930 
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Volume, Publication Part 
Article title Author(s) Publication Issue date number 
APPLICATION DEVELOPMENT AND LANGUAGES (cont.) 
PATHWAY IDS: A Message-level Interface to Devices and Processes M. Anderton, M. Noonan TSR 2,2 June 1986 83937 
The RESPOND OLTP Business Management System H. Bolling, W. Bronson TSR 9,1 Winter 1993 89804 
for Manufacturing 
State-of-the-Art C Compiler E. Kit TSR 2,2 June 1986 83937 
TACL, Tandem’s New Extensible Command Language J. Campbell, R. Glascock TSR 2,1 Feb. 1986 83936 
Tandem’s New COBOL85 D. Nelson TSR 21 Feb. 1986 83936 
The DAL Server: Client/Server Access to Tandem Databases W. Schlansky, TSR 91 Winter 1993 89804 
J. Schrengohst 
The ENABLE Program Generator for Multifile Applications B. Chapman, J. Zimmerman TSR 1,1 Feb. 1985 83934 
TMF and the Multi-Threaded Requester T. Lemberger TJ 1,1 Fall 1983 83930 
Writing a Command Interpreter D. Wong TSR 1,2 June 1985 83935 
CLIENT/SERVER 
An Overview of ClienvServer Computing on Tandem Systems H. Cooperstein TSR 8,3 Fall 1992 89803 
Application Profile: Storing Macintosh Graphics on the D. Broyles TSR 9,3 Summer 1993 89806 
Tandem 5200 Optical Storage Facility 
Client/Server Availability A. Wood TSR 10,2 April 1994 104398 
Designing and Implementing a Graphical User Interface S. Wolfe TSR 9,3 Summer 1993 89806 
Designing Client/Server Applications for OLTP on W. Culman TSR 8,3 Fall 1992 89803 
Guardian 90 Systems 
Extending the Client/Server Model With Object-Oriented Technology __T. Rohner TSR 10,1 Jan, 1994 104396 
Gateways to NonStop SQL D. Slutz TSR 6,2 Oct. 1990 46987 
Implementing Client/Server Using RSC M. lem, T. Kocher TSR 8,3 Fall 1992 89803 
The DAL Server: Client/Server Access to Tandem Databases W. Schlansky, TSR 9,1 Winter 1993 89804 
J. Schrengohst 
DATA COMMUNICATIONS 
An Overview of SNAX/CDF M. Turner TSR 5,2 Sept. 1989 28152 
A SNAX Passthrough Tutorial D. Kirk TJ 2,2 Spring 1984 83932 
Basic Uses and New Features of Extended GDS A. Hotea TSR 10.1 Jan. 1994 104396 
Changes in FOX N. Donde TSR A523 June 1985 83935 
Connecting Terminals and Workstations to Guardian 90 Systems E. Siegel TSR 8,2 Summer 1992 69848 
Expand High-Performance Solutions D. Smith TSR 9,3 Summer 1993 89806 
Introduction to MULTILAN A. Coyle TSR 4,1 Feb. 1988 11078 
Overview of the MULTILAN Server A. Rowe TSR 41 Feb. 1988 11078 
SNAX/APC: Tandem’s New SNA Software for Distributed Processing B. Grantham TSR 3,1 March 1987 83939 
SNAX/HLS: An Overview S. Saltwick TSR 1,2 June 1985 83935 
TLAM: A Connectivity Option for Expand K. MacKenzie TSR FA April 1991 46988 
Using the MULTILAN Application Interfaces M. Berg, A. Rowe TSR 4,1 Feb. 1988 11078 
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A Comparison of the BOO DP1 and DP2 Disc Processes T. Schachter TSR 1,2 dune 1985 83935 
An Overview of NonStop SQL Release 2 M. Pong TSR 6,2 Oct. 1990 46987 
Batch Processing in Online Enterprise Computing T. Keefauver TSR 6,2 Oct. 1990 46987 
Concurrency Control Aspects of Transaction Design W. Senf TSR 6,1 March 1990 =32968 
Converting Database Files from ENSCRIBE to NonStop SQL W. Weikel TSR 6,1 March 1990 32986 
DP1-DP2 File Conversion: An Overview J. Tate TSR 2,1 Feb. 1986 83936 
Determining FCP Conversion Time J. Tate TSR 2,1 Feb. 1986 83936 
DP2’s Efficient Use of Cache T. Schachter TSR 1,2 June 1985 83935 
DP2 Highlights K. Carlyle, L. McGowan TSR 1,2 June 1985 83935 
DP2 Key-sequenced Files T. Schachter TSR 1,2 June 1985 83935 
Gateways to NonStop SQL D. Stutz TSR 6,2 Oct. 1990 46987 
High-Performance SQL Through Low-Level System Integration A. Borr TSR 4,2 July 1988 13693 
Improvements in TMF T. Lemberger TSR 1,2 June 1985 83935 
NetBatch: Managing Batch Processing on Tandem Systems D. Wakashige TSR a4 April 1989 18662 
NetBatch-Plus: Structuring the Batch Environment G. Earle, D. Wakashige TSR 6,1 March 1990 32986 
NonStop SQL: The Single Database Solution J. Cassidy, T. Kocher TSR 5,2 Sept. 1989 28152 
NonStop SQL Data Dictionary R. Holbrook, D. Tsou TSR 4,2 July 1988 13693 
NonStop SQL Optimizer: Basic Concepts M. Pong TSR 4,2 July 1988 13693 
NonStop SQL Optimizer: Query Optimization and User Influence M. Pong TSR 4,2 July 1988 13693 
NonStop SQL Reliability C. Fenner TSR 4,2 July 1988 13693 
Online Information Processing J. Viescas TSR 9,1 Winter 1993 89804 
Online Reorganization of Key-Sequenced Tables and Files G. Smith TSR 6,2 Oct. 1990 46987 
Optimizing Batch Performance T. Keefauver TSR 5,2 Sept. 1989 28152 
Overview of NonStop SQL H. Cohen TSR 42 July 1988 13693 
Parallelism in NonStop SQL Release 2 M. Moore, A. Sodhi TSR 6,2 Oct. 1990 46987 
The NonStop SQL Release 2 Benchmark S. Englert, J. Gray, TSR 6,2 Oct. 1990 46987 
T. Kocher, P. Shah 
The Outer Join in NonStop SQL J. Vaishnav TSR 6,2 Oct. 1990 46987 
The Relational Data Base Management Solution G. Ow TJ 21 Winter 1984 83931 
Tandem’s NonStop SQL Benchmark Tandem Performance Group TSR 41 Feb. 1988 11078 
The TRANSFER Delivery System for Distributed Applications S. Van Pelt TJ 22 Spring 1984 83932 
TMF Autorollback: A New Recovery Feature M. Pong TSR 1,1 Feb. 1985 83934 
DECISION SUPPORT SYSTEMS 
Online Information Processing J. Viescas TSR 9,1 Winter 1993 89804 
The DAL Server: Client/Server Access to Tandem Databases W. Schlansky, TSR 9,1 Winter 1993 89804 
J. Schrengohst 
The RESPOND OLTP Business Management System H. Bolling, W. Bronson TSR 9,1 Winter 1893 89804 
for Manufacturing 
OBJECT-ORIENTED TECHNOLOGY 
Extending the Client/Server Model With Object-Oriented Technology — T. Rohner TSR 10,1 Jan, 1994 104396 
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OPERATING SYSTEMS 
Application Code Conversion for D-Series Systems K. Liu TSR 9,2 Spring 1993 89805 
Highlights of the BOO Software Release K. Coughlin, R. Montevaldo TSR 1,2 June 1985 83935 
Increased Code Space A. Jordan TSR 1,2 June 1985 83935 
Managing System Time Under GUARDIAN 90 E. Nellen TSR 21 Feb. 1986 83936 
Migration Planning for D-Series Systems S. Kuukka TSR 9,2 Spring 1993 89805 
New GUARDIAN 90 Time-keeping Facilities E. Nellen TSR 1,2 June 1985 83935 
New Process-timing Features S. Sharma TSR 1,2 June 1985 83935 
NonStop || Memory Organization and Extended Addressing D. Thomas TJ 1,1 Fall 1983 83930 
Overview of the COO Release L. Marks TSR 41 Feb. 1988 11078 
Overview of the D-Series Guardian 90 Operating System W. Bartlett TSR 9,2 Spring 1993 89805 
Overview of the NonStop-UX Operating System for the Integrity S2 P. Norwood TSR TA April 1991 46988 
Robustness to Crash in a Distributed Data Base: A. Borr TSR 1,2 June 1985 83935 
A Nonshared-memory Approach 
The GUARDIAN Message System and How to Design for It M. Chandra TSR 1,1 Feb. 1985 83934 
The NonStop Himalaya K10000 Interprocessor Bus R. Jardine, S. Hamilton, TSR 10,2 April 1994 104398 
K. Krishnakumar 
The Tandem Global Update Protocol R. Carr TSR 1,2 June 1985 83935 
PERFORMANCE AND CAPACITY PLANNING 
A Performance Retrospective P. Oleinick, P. Shah TSR 2.3 Dec. 1986 83938 
Buffering for Better Application Performance R. Mattran TSR 2,1 Feb. 1986 83936 
Capacity Planning Concepts R. Evans TSR 2,3 Dec. 1986 83938 
Capacity Planning With TCM W. Highleyman TSR 72 Oct. 1991 65248 
C00 TMDS Performance J. Mead TSR 4,1 Feb, 1988 11078 
Credit-authorization Benchmark for High Performance and T. Chmiel, T. Houy TSR 2,1 Feb. 1986 83936 
Linear Growth 
Debugging Accelerated Programs on TNS/R Systems D. Cressler TSR 8,1 Spring 1992 65250 
DP2 Performance J. Enright TSR 1,2 June 1985 83935 
Estimating Host Response Time in a Tandem System H. Horwitz TSR 4,3 Oct. 1988 15748 
Expand High-Performance Solutions D. Smith TSR 9,3 Summer 1993 89806 
FASTSORT: An External Sort Using Parallel Processing J. Gray, M. Stewart, TSR 2,3 Dec. 1986 83938 
A. Tsukerman, S. Uren, 
B. Vaughan 
Getting Optimum Performance from Tandem Tape Systems A. Khatri TSR 2,3 Dec. 1986 83938 
How to Set Up a Performance Data Base with M. King TSR 2,3 Dec. 1986 83938 
MEASURE and ENFORM 
implementing a Systems Management Improvement Program J. Dagenais TSR 9,4 Fall 1993 89807 
Improved Performance for BACKUP2 and RESTORE2 A. Khatri, M. McCline TSR 1,2 June 1985 83935 
Improving Performance on TNS/R Systems With the Accelerator M. Blanchet TSR 8,1 Spring 1992 65250 
MEASURE: Tandem’s New Performance Measurement Tool D. Dennison TSR 2,3 Dec. 1986 83938 
Measuring DSM Event Management Performance M. Stockton TSR 8,1 Spring 1992 65250 
Message System Performance Enhancements D. Kinkade TSR 2,3 Dec. 1986 83938 
Message System Performance Tests S. Uren TSR 2,3 Dec. 1986 83938 
Network Design Considerations J. Evjen TSR 5,2 Sept. 1989 28152 
NonStop NET/MASTER: Configuration and Performance Guidelines M. Stockton TSR 9,4 Fall 1993 89807 
NonStop VLX Performance J. Enright TSR 2,3 Dec. 1986 83938 
Optimizing Sequential Processing on the Tandem System R. Welsh TJ 2,3 Summer 1984 83933 
Pathway TCP Enhancements for Application Run-Time Support R. Vannucci TSR ray April 1991 46988 
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PERFORMANCE AND CAPACITY PLANNING (cont.) 
Performance Benefits of Parallel Query Execution and Mixed S. Englert, J. Gray TSR 6,2 Oct. 1990 46987 
Workload Support in NonStop SQL Release 2 
Performance Considerations for Application Processes R. Glasstone TSR 2,3 Dec. 1986 83938 
Performance Measurements of an ATM Network Application N. Cabell, D. Mackie TSR 23 Dec. 1986 83938 
Predicting Response Time in On-line Transaction Processing Systems A. Khatri TSR 2,2 June 1986 83937 
The 6600 and TCC6820 Communications Controllers: P. Beadles TSR 2,3 Dec. 1986 83938 
A Performance Comparison 
The ENCORE Stress Test Generator for On-line Transaction S. Kosinski TJ 2,1 Winter 1984 83931 
Processing Applications 
The PATHWAY TCP: Performance and Tuning J. Vatz TSR 1,1 Feb. 1985 83934 
The Performance Characteristics of Tandem NonStop Systems J. Day TJ 1,1 Fall 1983 83930 
Sizing Cache for Applications that Use B-series DP1 and TMF P. Shah TSR 2,2 June 1986 83937 
Sizing the Spooler Collector Data File H. Norman TSR 4,1 Feb, 1988 11978 
Tandem’s 5200 Optical Storage Facility: Performance and S. Coleman TSR 5,1 April 1989 18662 
Optimization Considerations 
Tandem’s Approach to Fault Tolerance B. Ball, W. Bartlett, TSR 41 Feb, 1988 11078 
S. Thompson 
Understanding PATHWAY Statistics R. Wong TJ 2,2 Spring 1984 83932 
PERIPHERALS 
5120 Tape Subsystem Recording Technology - W. Phillips TSR 3,2 Aug. 1987 83940 
An Introduction to DYNAMITE Workstation Host Integration S. Kosinski TSR 1,2 June 1985 83935 
Application Profile: Storing Macintosh Graphics on the D. Broyles TSR 9,3 Summer 1993 89806 
Tandem 5200 Optical Storage Facility 
Data-Encoding Technology Used in the XL8 Storage Facility D. S.Ng TSR 2,2 June 1986 83937 
Data-Window Phase-Margin Analysis A. Painter, H. Pham, TSR 2,2 June 1986 83937 
H. Thomas 
Introducing the 3207 Tape Controller S. Chandran TSR 1,2 June 1985 83935 
Peripheral Device Interfaces J. Blakkan TSR 3,2 Aug. 1987 83940 
Plated Media Technology Used in the XL8 Storage Facility D.S. Ng TSR 2,2 June 1986 83937 
Streaming Tape Drives J. Blakkan TSR 3,2 Aug. 1987 83940 
Terminal Selection E. Siegel TSR 8,2 Summer 1992 69848 
The 5200 Optical Storage Facility: A Hardware Perspective A. Patel TSR 5,1 April 1989 18662 
The 6100 Communications Subsystem: A New Architecture R, Smith TJ 2,1 Winter 1984 83931 
The 6600 and TCC6820 Communications Controllers: P. Beadles TSR 2,3 Dec. 1986 83938 
A Performance Comparison 
The DYNAMITE Workstation: An Overview G. Smith TSR 1,2 June 1985 83935 
The Model 6VI Voice Input Option: Its Design and Implementation B. Huggett TJ 2,3 Summer 1984 83933 
The Role of Optical Storage in Information Processing L. Sabaroff TSR 3,2 Aug. 1987 83940 
The V8 Disc Storage Facility: Setting a New Standard for M. Whiteman TSR 1,2 June 1985 83935 
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PROCESSORS 
Fault Tolerance in the NonStop Cyclone System S. Chan, R. Jardine TSR 7,1 April 1991 46988 
A Hardware Overview of the NonStop Himalaya K10000 Server C. Kong TSR 10,1 Jan. 1994 104396 
NonStop CLX: Optimized for Distributed On-Line D. Lenoski TSR 5,1 April 1989 18662 
Transaction Processing 
NonStop VLX Hardware Design M. Brown TSR 2,3 Dec. 1986 83938 
Overview of Tandem NonStop Series/RISC Systems L. Faby, R. Mateosian TSR 8,1 Spring 1992 65250 
The High-Performance NonStop TXP Processor W. Bartlett, T. Houy, TJ 2,1 Winter 1984 83931 
Transaction Processing D. Meyer 
The NonStop Himalaya K10000 Interprocessor Bus R. Jardine, S. Hamilton, TSR 10,2 April 1994 104398 
K. Krishnakumar 
The NonStop TXP Processor: A Powerful Design for On-line P. Oleinick TJ 23 Summer 1984 83933 
Transaction Processing 
The VLX: A Design for Serviceability J. Allen, R. Boyle TSR 3,1 March 1987 83939 
SECURITY 
Dial-In Security Considerations P. Grainger TSR 7,2 Oct. 1991 65248 
Distributed Protection with SAFEGUARD T. Chou TSR 2,2 June 1986 83937 
Enhancing System Security With Safeguard C. Gaydos TSR 7,1 April 19914 46988 
SYSTEM CONNECTIVITY 
Basic Uses and New Features of Extended GDS A. Hotea TSR 10,1 Jan. 1994 104396 
Building Open Systems Interconnection with OSI/AS and OSI/TS R. Smith TSR 6,1 March 1990 =. 32986 
Connecting Terminals and Workstations to Guardian 90 Systems E. Siegel TSR 8,2 Summer 1992 69848 
Implementing Client/Server Using RSC M. lem, T. Kocher TSR 8,3 Fall 1992 89803 
Network Design Considerations J. Evjen TSR 5,2 Sept. 1989 28152 
Terminal Connection Alternatives for Tandem Systems J. Simonds TSR 5,1 April 1989 18662 
Terminal Selection E. Siegel TSR 8,2 Summer 1992 69848 
The OSI Model: Overview, Status, and Current Issues A. Dunn TSR 5,1 April 1989 18662 
SYSTEM MANAGEMENT 
Configuring Tandem Disk Subsystems S. Sitler TSR 2,3 Dec. 1986 83938 
Data Replication in Tandem’s Distributed Name Service T. Eastep TSR 4,3 Oct. 1988 15748 
Enhancements to TMDS L. White TSR 3,2 Aug. 1987 83940 
Event Management Service Design and Implementation H. Jordan, R. McKee, TSR 4,3 Oct. 1988 15748 
R. Schuet 
Implementing a Systems Management Improvement Program J. Dagenais TSR 9,4 Fall 1993 89807 
Instrumenting Applications for Effective Event Management J. Dagenais TSR 7,2 Oct. 1991 65248 
Introducing TMDS, Tandem’s New On-line Diagnostic System J. Troisi TSR 1,2 June 1985 83935 
Measuring DSM Event Management Performance M. Stockton TSR 8,1 Spring 1992 65250 
Network Statistics System M. Miller TSR 4,3 Oct. 1988 15748 
NonStop NET/MASTER: Configuration and Performance Guidelines M. Stockton TSR 9,4 Fall 1993 89807 
NonStop NET/MASTER: Event Management Architecture M. Stockton TSR 9,4 Fall 1993 89807 
NonStop NET/MASTER: Event Processing Costs and M. Stockton TSR 9,4 Fall 1993 89807 
Sizing Calculations 
Overview of DSM P. Homan, B. Malizia, TSR 4,3 Oct. 1988 15748 
E. Reisner 
SCP and SCF: A General Purpose Implementation of the T, Lawson TSR 43 Oct. 1988 15748 
Subsystem Programmatic Interface 
RDF: An Overview J. Guerrero TSR 7,2 Oct. 1991 65248 
RDF Synchronization F. Jongma, W. Senf TSR 8,2 Summer 1992 69848 
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Tandem’s Subsystem Programmatic Interface G. Tom TSR 4,3 Oct. 1988 15748 
Using FOX to Move a Fault-tolerant Application C. Breighner TSR 5 a Feb. 1985 83934 
Using the Subsystem Programmatic Interface and Event K. Stobie TSR 4,3 Oct. 1988 15748 
Management Services 

VIEWPOINT Operations Console Facility R. Hansen, G. Stewart TSR 4,3 Oct. 1988 15748 
VIEWSYS: An On-line System-resource Monitor D. Montgomery TSR 1,2 June 1985 83935 
Writing Rules for Automated Operations J. Collins TSR 7,2 Oct. 1991 65248 
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Enhancements to PS MAIL R. Funk TSR 3,1 March 1987 83939 
March 1987 83939 
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3207 TAPE CONTROLLER 

Introducing the 3207 Tape Controller S. Chandran TSR 12 June 1985 83935 
5120 TAPE SUBSYSTEM 

5120 Tape Subsystem Recording Technology W. Phillips TSR 3,2 Aug. 1987 83940 
5200 OPTICAL STORAGE 

Application Profile: Storing Macintosh Graphics on the D. Broyles TSR 9,3 Summer 1993 89806 

Tandem 5200 Optical Storage Facility 

Tandem’s 5200 Optical Storage Facility: Performance and S. Coleman TSR 5,1 April 1989 18662 

Optimization Considerations 

The 5200 Optical Storage Facility: A Hardware Perspective A. Patel TSR 5A April 1989 18662 

The Role of Optical Storage in Information Processing L. Sabaroff TSR 41 Feb. 1988 11078 
6100 COMMUNICATIONS SUBSYSTEM 

The 6100 Communications Subsystem: A New Architecture R. Smith TJ 2,1 Winter 1984 83931 
6530 TERMINAL 

The Model 6VI Voice Input Option: {ts Design and Implementation B. Huggett TJ 2,3 Summer 1984 83933 
6600 AND TCC6820 COMMUNICATIONS CONTROLLERS 

The 6600 and TCC6820 Communications Controllers: P. Beadles TSR 2,3 Dec. 1986 83938 

A Performance Comparison 
BASIC 

An Introduction to Tandem EXTENDED BASIC J. Meyerson TJ 2,2 Spring 1984 83932 
c 

State-of-the-art C Compiler E. Kit TSR ee. June 1986 83937 
CAM 

Automating Call Centers With CAM W. Choi TSR 10,2 April 1994 104398 
CIs 

Customer Information Service J. Massucco TSR 3,1 March 1987 83939 
CLX 

NonStop CLX: Optimized for Distributed On-Line D. Lenoski TSR 5,1 April 1989 18662 

Transaction Processing 
COBOL85 

Tandem’'s New COBOL85 D. Nelson TSR 2,1 Feb. 1986 83936 
COMINT (Cl) 

Writing a Command Interpreter D. Wong TSR 1,2 June 1985 83935 
CYCLONE 

Fault Tolerance in the NonStop Cyclone System S. Chan, R. Jardine TSR 7,1 April 1991 46988 
DAL SERVER 

The DAL Server: Client/Server Access to Tandem Databases W. Schlansky, TSR 9,1 Winter 1993 89804 
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DP1 AND DP2 
A Comparison of the BOO DP1 and DP2 Disc Processes T. Schachter TSR 1,2 June 1985 83935 
Determining FCP Conversion Time J. Tate TSR 2,1 Feb. 1986 83936 
DP1-DP2 File Conversion: An Overview J. Tate TSR 2,1 Feb. 1986 83936 
DP2 Highlights K. Carlyle, L. McGowan TSR 1,2 June 1985 83935 
DP2 Key-sequenced Files T. Schachter TSR 452 June 1985 83935 
DP2 Performance J. Enright TSR 1,2 June 1985 83935 
DP2’s Efficient Use of Cache T. Schachter TSR 1,2 June 1985 83935 
Sizing Cache for Applications that Use B-series DP1 and TMF P. Shah TSR 2,2 June 1986 83937 
DSM 
Data Replication in Tandem’s Distributed Name Service T. Eastep TSR 4,3 Oct. 1988 15748 
Event Management Service Design and Implementation H. Jordan, R. McKee, TSR 4,3 Oct. 1988 15748 
R. Schuet 
Instrumenting Applications for Effective Event Management J. Dagenais TSR he. Oct. 1991 65248 
Measuring DSM Event Management Performance M. Stockton TSR 8,1 Spring 1992 65250 
Network Statistics System M. Miller TSR 4,3 Oct. 1988 15748 
Overview of DSM P. Homan, B. Malizia, TSR 4,3 Oct. 1988 15748 
E. Reisner 
SCP and SCF: A General Purpose Implementation of the T. Lawson TSR 4,3 Oct. 1988 15748 
Subsystem Programmatic Interface 
Tandem’s Subsystem Programmatic Interface G. Tom TSR 4,3 Oct. 1988 15748 
Using the Subsystem Programmatic Interface and Event K. Stobie TSR 4,3 Oct. 1988 15748 
Management Services 
VIEWPOINT Operations Console Facility R. Hansen, G. Stewart TSR 4,3 Oct. 1988 15748 
Writing Rules for Automated Operations J. Collins TSR 7,2 Oct. 1991 65248 
DYNAMITE 
An Introduction to DYNAMITE Workstation Host Integration S. Kosinski TSR 1,2 June 1985 83935 
The DYNAMITE Workstation: An Overview G. Smith TSR 1,2 June 1985 83935 
ENABLE 
The ENABLE Program Generator for Multifile Applications B. Chapman, J. Zimmerman TSR 1,1 Feb. 1985 83934 
ENCOMPASS 
The Relational Data Base Management Solution G. Ow TJ 2,1 Winter 1984 83931 
ENCORE 
The ENCORE Stress Test Generator for On-line Transaction S. Kosinski TJ 21 Winter 1984 83931 
Processing Applications 
ENSCRIBE 
Converting Database Files from ENSCRIBE to NonStop SQL W. Weikel TSR 6,1 March 1990 32986 
EXPAND 
Expand High-Performance Solutions D. Smith TSR 9,3 Summer 1993 89806 
FASTSORT 
FASTSORT: An External Sort Using Parallel Processing J. Gray, M. Stewart, TSR 2,3 Dec. 1986 83938 
A. Tsukerman, S. Uren, 
B. Vaughan 
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FOX 
Changes in FOX N. Donde TSR 12 June 1985 83935 
Using FOX to Move a Fault-tolerant Application C. Breighner TSR 14 Feb. 1985 83934 
FUP 
Online Reorganization of Key-Sequenced Tables and Files G. Smith TSR 6,2 Oct. 1990 46987 
GDS 
Basic Uses and New Features of Extended GDS A. Hotea TSR 10,1 Jan. 1994 104396 
GUARDIAN 90 
Application Code Conversion for D-Series Systems K. Liu TSR 9,2 Spring 1993 89805 
BOO Software Manuals S. Olds TSR 12 June 1985 83935 
C00 Software Manuals E. Levi TSR 41 Feb, 1988 11078 
Highlights of the BOO Software Release K. Coughlin, R. Montevaldo TSR 1,2 June 1985 83935 
Improved Performance for BACKUP2 and RESTORE2 A. Khatri, M. McCline TSR 1,2 June 1985 83935 
Increased Code Space A. Jordan TSR 1,2 June 1985 83935 
Managing System Time Under GUARDIAN 90 E. Nellen TSR 2,1 Feb. 1986 83936 
Message System Performance Enhancements D. Kinkade TSR 2,3 Dec. 1986 83938 
Message System Performance Tests S. Uren TSR 2,3 Dec. 1986 83938 
Migration Planning for D-Series Systems S. Kuukka TSR 9,2 Spring 19938 89805 
New GUARDIAN 90 Time-keeping Facilities E. Nellen TSR 12 June 1985 83935 
New Process-timing Features S. Sharma TSR 1,2 June 1985 83935 
NonStop || Memory Organization and Extended Addressing D. Thomas TJ 11 Fall 1983 83930 
Overview of the COO Release L. Marks TSR 41 Feb. 1988 11078 
Overview of the D-Series Guardian 90 Operating System W. Bartlett TSR 9,2 Spring 1993 89805 
Robustness to Crash in a Distributed Data Base: A. Borr TSR 1,2 June 1985 83935 
A Nonshared-memory Multiprocessor Approach 
Tandem’s Approach to Fault Tolerance B. Ball, W. Bartlett, TSR 4,1 Feb. 1988 11078 
S. Thompson 
The GUARDIAN Message System and How to Design for It M. Chandra TSR | Feb. 1985 83934 
The Tandem Global Update Protocol R. Carr TSR 1,2 June 1985 83935 
HIMALAYA 
A Hardware Overview of the NonStop Himalaya K10000 Server C. Kong TSR 10,1 Jan. 1994 104396 
The NonStop Himalaya K10000 Interprocessor Bus R. Jardine, S. Hamilton, TSR 10,2 April 1994 104398 
K. Krishnakumar 
INTEGRITY S2 
Overview of the NonStop-UX Operating System for the Integrity S2 P. Norwood TSR 7,1 April 1991 46988 
MEASURE 
How to Set Up a Performance Data Base with MEASURE M. King TSR 2,3 Dec. 1986 83938 
and ENFORM 
MEASURE: Tandem's New Performance Measurement Tool D. Dennison TSR 2,3 Dec. 1986 83938 
MULTILAN 
Introduction to MULTILAN A. Coyle TSR 4,1 Feb. 1988 11078 
Overview of the MULTILAN Server A. Rowe TSR 4,1 Feb. 1988 11078 
Using the MULTILAN Application Interfaces M. Berg, A. Rowe TSR 41 Feb. 1988 11078 
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NETBATCH-PLUS 

NetBatch: Managing Batch Processing on Tandem Systems D. Wakashige TSR 5,1 April 1989 18662 

NetBatch-Plus: Structuring the Batch Environment G. Earle, D. Wakashige TSR 6,1 March 1990 32986 
NONSTOP NET/MASTER 

NonStop NET/MASTER: Configuration and Performance Guidelines M. Stockton TSR 9,4 Fall 1993 89807 

NonStop NET/MASTER: Event Management Architecture M. Stockton TSR 9,4 Fall 1993 89807 

NonStop NET/MASTER: Event Processing Costs and M. Stockton TSR 9,4 Fall 1993 89807 


Sizing Calculations 
NONSTOP SQL 
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An Overview of NonStop SQL Release 2 M. Pong TSR 6,2 Oct. 1990 46987 
Concurrency Control Aspects of Transaction Design W. Senf TSR 6,1 March 1990 32986 
Converting Database Files from ENSCRIBE to NonStop SQL W. Weikel TSR 6,1 March 1990 32986 
Gateways to NonStop SQL D. Slutz TSR 6,2 Oct. 1990 46987 
High-Performance SQL Through Low-Level System Integration A. Borr TSR 4,2 July 1988 13693 
NonStop SQL Data Dictionary R. Holbrook, D. Tsou TSR 4.2 July 1988 13693 
NonStop SQL: The Single Database Solution J. Cassidy, T. Kocher TSR 5,2 Sept. 1989 28152 
NonStop SQL Optimizer: Basic Concepts M. Pong TSR 4,2 July 1988 13693 
NonStop SQL Optimizer: Query Optimization and User Influence M. Pong TSR 4,2 July 1988 13693 
NonStop SQL Reliability C. Fenner TSR 4,2 July 1988 13693 
Overview of NonStop SQL H. Cohen TSR 4,2 July 1988 13693 
Parallelism in NonStop SQL Release 2 M. Moore, A. Sodhi TSR 6,2 Oct. 1990 46987 
Performance Benefits of Parallel Query Execution and Mixed S. Englert, J. Gray TSR 6,2 Oct. 1990 46987 
Workload Support in NonStop SQL Release 2 
Tandem’s NonStop SQL Benchmark Tandem Performance Group TSR 41 Feb. 1988 11078 
The NonStop SQL Release 2 Benchmark S. Englert, J. Gray, TSR 6,2 Oct. 1990 46987 
T. Kocher, P. Shah 
The Outer Join in NonStop SQL J. Vaishnav TSR 6,2 Oct. 1990 46987 
OsI 
Building Open Systems Interconnection with OSI/AS and OSI/TS R. Smith TSR 61 March 1990 32986 
The OS! Model: Overview, Status, and Current Issues A. Dunn TSR 5,1 April 1989 18662 
PATHFINDER 
. PATHFINDER-An Aid for Application Development S. Benett TJ 1,1 Fall 1983 83930 
PATHWAY 
A New Design for the PATHWAY TCP R. Wong TJ 2,2 Spring 1984 83932 
PATHWAY IDS: A Message-level Interface to Devices and Processes M. Anderton, M. Noonan TSR 2,2 June 1986 83937 
Pathway TCP Enhancements for Application Run-Time Support R. Vannucci TSR TA April 1991 46988 
The PATHWAY TCP: Performance and Tuning J. Vatz TSR 11 Feb. 1985 83934 
Understanding PATHWAY Statistics R. Wong TJ 2,2 Spring 1984 83932 
POET 
Designing Client/Server Applications for OLTP on W. Culman TSR 8,3 Fall 1992 89803 


Guardian 90 Systems 


eee 


PS MAIL 


Enhancements to PS MAIL 


R. Funk 


3,1 


March 1987 


83939 


TAN DEM 


S Y S$ TEMS 


RE V I 


EY W. 


Volume, Publication Part 


Articte title Author(s) Publication Issue date number 
RDF 

RDF: An Overview J. Guerrero TSR 7,2 Oct. 1991 65248 

RDF Synchronization F. Jongma, W. Senf TSR 8,2 Summer 1992 69848 
RESPOND 

The RESPOND OLTP Business Management System H. Bolling, W. Bronson TSR 9,1 Winter 1993 89804 

for Manufacturing 
Rsc 

Implementing Client/Server Using RSC M. lem, T. Kocher TSR 8,3 Fall 1992 89803 
SAFEGUARD 

Dial-In Security Considerations P. Grainger TSR 7,2 Oct. 1991 65248 

Distributed Protection with SAFEGUARD T. Chou TSR 2,2 June 1986 83937 

Enhancing System Security With Safeguard C. Gaydos TSR 7,1 April 1991 46988 
SNAX 

An Overview of SNAX/CDF M. Turner TSR 5,2 Sept. 1989 28152 

A SNAX Passthrough Tutorial D. Kirk TJ 2,2 Spring 1984 83932 

SNAX/APC: Tandem’s New SNA Software for Distributed Processing 8B. Grantham TSR 3,1 March 1987 83939 

SNAX/HLS: An Overview S. Saltwick TSR 1,2 June 1985 83935 
SPOOLER 

Sizing the Spooler Collector Data File H. Norman TSR 4,1 Feb. 1988 11078 
TACL 

Debugging TACL Code L. Palmer TSR 4,2 July 1988 13693 

TACL, Tandem’s New Extensible Command Language J. Campbell, R. Glascock TSR 2,1 Feb. 1986 83936 
TAL 

New TAL Features C. Lu, J. Murayama TSR 2,2 June 1986 83837 
TCM 

Capacity Planning With TCM W. Highleyman TSR 7,2 Oct. 1991 65248 
TLAM 

TLAM: A Connectivity Option for Expand K. MacKenzie TSR 7,1 April 1991 46988 
TMDS 

C00 TMDS Performance J. Mead TSR 41 Feb. 1988 11078 

Enhancements to TMDS L. White TSR 32 Aug. 1987 83940 

Introducing TMDS, Tandem’s New On-line Diagnostic System J. Troisi TSR 1,2 June 1985 83935 
TMF 

Improvements in TMF T. Lemberger TSR 1,2 June 1985 83935 

TMF and the Multi-Threaded Requester T. Lemberger TJ 11 Fall 1983 83930 

TMF Autorollback: A New Recovery Feature M. Pong TSR 11 Feb. 1985 83934 
TNS/R 

Debugging Accelerated Programs on TNS/R Systems D. Cressler TSR 8,1 Spring 1992 65250 

Improving Performance on TNS/R Systems With the Accelerator M. Blanchet TSR 8,1 Spring 1992 65250 

Overview of Tandem NonStop Series/RISC Systems L. Faby, R. Mateosian TSR 81 Spring 1992 65250 


APRIL It99 4 ¢ TAN DEM S Y S TEM §S RE V TF EW 


S) 


Volume, Publication Part 

Article title Author(s) Publication Issue date number 
TRANSFER 

The TRANSFER Delivery System for Distributed Applications S. Van Pelt TJ 2,2 Spring 1984 83932 
TXP 

The High-Performance NonStop TXP Processor W. Bartlett, T. Houy, Td 2,1 Winter 1984 83931 

D. Meyer 

The NonStop TXP Processor: A Powerful Design for On-line P. Oleinick TJ 2,3 Summer 1984 83933 

Transaction Processing 
v8 

The V8 Disc Storage Facility: Setting a New Standard for M. Whiteman TSR 1,2 June 1985 83935 

On-line Disc Storage 
VIEWSYS 

VIEWSYS: An On-line System-resource Monitor D. Montgomery TSR 1,2 June 1985 83935 
VLX 

NonStop VLX Hardware Design M. Brown TSR 2,3 Dec. 1986 83938 

NonStop VLX Performance J. Enright TSR 2,3 Dec. 1986 83938 

The VLX: A Design for Serviceability J. Allen, R. Boyle TSR 3,1 March 1987 83939 
XL8 

Data-encoding Technology Used in the XL8 Storage Facility D.S.Ng TSR 2,2 June 1986 83937 

Plated Media Technology Used in the XL8 Storage Facility D. S.Ng TSR 2,2 June 1986 83937 

58 TANDEM S YSTEMS REVIEW e APRIL 199 4 


TandemSystemsReviewOrderform 


Use this form to order new subscriptions, change subscription information, and order back issues. 


— 


I am a Tandem customer. My Tandem sales representative is 


_j 1am not a Tandem customer and am enclosing a check or money order for the requests indicated 


on this form. (Subscriptions are $75 per year and each back issue is $20. Make checks payable to 


Tandem Computers Incorporated.) 


Subscription Information 


_] New subscription 


Update to subscription information 
Subscription number: 


Your subscription number is in the upper right corner of the 
mailing label. 


COMPANY 


NAME 


JOB TITLE 


DIVISION 


ADDRESS 


COUNTRY 


TELEPHONE NUMBER (include all codes for U.S. dialing) 


Title or position: 


LJ] President/CEO 
|_| Director/VP information services 
MIS/DP manager 


|’ Software development manager 


(_j Programmer/analyst 
_| System operator 

_] End user 

Li Other: 


Your association with Tandem: 


Tandem customer 


|_| Third-party vendor 


(| Consultant 
_! Other: 


Back Issue Requests 


Number 


orcopies Tandem Systems Review 


____ Vol. 1, No. 1, Feb. 1985 ____ Vol. 7, No. 1, April 1991 
_____ Vol. 1, No. 2, June 1985 —____ Vol. 7, No. 2, Oct. 1991 
_____ Vol. 2, No. 1, Feb. 1986 —_— Vol. 8, No. 1, Spring 1992 
____ Vol. 2, No. 2, June 1986 —____ Vol. 8, No. 2, Summer 1992 
______ Vol. 2, No. 3, Dec. 1986 —— Vol. 8, No. 3, Fall 1992 
____ Vol. 3, No. 1, March 1987 ____ Vol. 9, No. 1, Winter 1993 
_____ Vol. 3, No. 2, Aug. 1987 = ____ Vol. 9, No. 2, Spring 1993 
_____ Vol. 4, No. 1, Feb. 1988 ____ Vol. 9, No. 3, Summer 1993 
______ Vol. 4, No. 2, July 1988 —_— Vol. 9, No. 4, Fall 1993 
____ Vol. 4, No. 3, Oct. 1988 —— Vol. 10, No. 1, Jan. 1994 
____ Vol. 5, No. 1, April 1989 =-_____ Vol. 10, No. 2, April 1994 


____ Vol. 5, No. 2, Sept. 1989 
______ Vol. 6, No. 1, March 1990 
Vol. 6, No. 2, Oct. 1990 


Tandem Journal 
_____ Vol. 1, No. 1, Fall 1983 __— Vol. 2, No. 2, Spring 1984 
_____ Vol. 2, No. 1, Winter 1984 _____ Vol. 2, No. 3, Summer 1984 


For questions or ordering information, call 
800-473-5868 in the U.S. and Canada or 
+1-408-285-0665 in other countries. 


Send this form to: 


Tandem Computers Incorporated 
Tandem Systems Review, Loc 208-65 
10400 Ridgeview Court 

Cupertino, CA 95014-0723 

FAX: +1-408-285-0840 


Tandem employees must order their subscrip- 
tions and back issues through Courier. 


Menu sequence: Marketing Information > 
Literature Orders > Technical Marketing 
Pubs (TSR) 


4/94 


4 FOLD 


v FOLD 


BUSINESS REPLY MAIL 


FIRST CLASS PERMIT NO. 482 CUPERTINO, CA U.S.A. 


POSTAGE WILL BE PAID BY ADDRESSEE 


TANDEM SYSTEMS REVIEW 

LOC 208-65 

TANDEM COMPUTERS INCORPORATED 
19333 VALLCO PARKWAY 

CUPERTINO, CA 95014-9862 


4 FOLD 


NO POSTAGE 
NECESSARY 
IF MAILED 
IN THE 
UNITED STATES 


v FOLD 


TandemSystemsReviewReaderSurvey 


The purpose of this questionnaire is to help the Tandem Systems Review staff select topics for 
publication. Postage is prepaid when mailed in the United States. Readers outside the U.S. should 
send their replies to their nearest Tandem sales office. 


1. How useful is each article in this issue? 
Product Update 


01 _] Indispensable 02 L_ Very 03 L| Somewhat 04 | Not at all 
The NonStop Himalaya K10000 Interprocessor Bus 
05 [| Indispensable 06 L] Very 07 —] Somewhat 08 [_j] Not at all 


Client/Server Availability 


09 _} Indispensable 10 _] Very 11 [- Somewhat 12 _] Notat all 
Automating Call Centers With CAM 
13 _] Indispensable 14 Very 15 |_| Somewhat 16 /. Not at all 


2. I specifically would like to see more articles on (select one): 


17 |") Overview discussions of new products and enhancements 18 C2 Performance and tuning information 


19 _] High-level overviews on Tandem’s approach to solutions 20 _] Application design and customer profiles 


21 __ Technical discussions of product internals 22 (1 Strategic information and statements of direction 


23 LJ Other 


3. Your title or position: 


24 |] President, VP, Director 25 L. Systems analyst 26 | System operator 
27 MIS manager 28 LC] Software developer 29 (_ End user 
30 Other 


4. Your association with Tandem: 


31 | | Tandem customer 32 "| Tandem employee 33 Li Third-party vendor 34 (© Consultant 


35 Other 


5. Comments 


NAME 


COMPANY NAME 


ADDRESS 


4/94 


4 FOLD 


v FOLD 


BUSINESS REPLY MAIL 


FIRST CLASS PERMIT NO. 482 CUPERTINO, CA U.S.A. 


POSTAGE WILL BE PAID BY ADDRESSEE 


TANDEM SYSTEMS REVIEW 

LOC 208-65 

TANDEM COMPUTERS INCORPORATED 
19333 VALLCO PARKWAY 

CUPERTINO, CA 95014-9862 


Nadsvubeladbvsceadbatialdatoatictistlionstatet 


4 FOLD 


NO POSTAGE 
NECESSARY 
IF MAILED 
IN THE 
UNITED STATES 


v FOLD 


“4TANDEM 


Tandem Computers Incorporated 
19333 Vallco Parkway 
Cupertino, CA 95014-2599 


Allen Goldin 
LOC NUM 128-00 
Jericho Ny Long Island Bran 


4/94 Printed in USA 


Part No. 104398 


